[slurm-dev] Re: Running a privileged process from a Spank plugin

2017-09-04 Thread Manuel Rodríguez Pascual

When I've been in that situation I have solved the problem with a lock
on a temporary file.

If you need any more help, please let me know. I probably still have
some examples around.

best,


Manuel



2017-09-03 15:02 GMT+02:00 Jordi A. Gómez :
> Hello,
>
> I am developing a Spank plugin which starts a privileged process once per
> node. This process will perform some work that require privileges like write
> frequency, read devices, etc.
>
> I know I have the option of run it when Slurm Daemon is started in a node
> and loads the Spank plugin. But I would prefer just to run it when its
> necesary using some flag (srun --flag).
>
> Because I want to start it one time per node I'm thinking which could be the
> best way to achieve that. In remote context I have the function
> slurm_spank_task_init_privileged(), but it's executed by every node task and
> I want to start it just once, just one process instance. These are my
> possibilies:
>
> 1) Using little maths to get the number of tasks per node and calc the
> minimum task number, which would be my selected task. For example, 4 tasks
> and 2 nodes, the tasks 0 and 2 would run my privileged process. But I'm not
> pretty sure that task numbers will follow always this pattern. And of course
> is someone runs 3 tasks in first node and 1 in the second one, this has no
> sense.
>
> 2) Some kind of inter process sync, or file block... to prevent that no
> other task in the same machine has started my process yet.
>
> Any better ideas?
>
> Thank you,
> Jordi.


[slurm-dev] Re: Account preferation

2017-08-08 Thread Manuel Rodríguez Pascual

yes, you can use backfill and multifactor together. Sorry if that was
not clear on my previous mail. Multifactor is in charge of sorting the
job queue, and backifll is in charge of executing them



2017-08-08 16:04 GMT+02:00 Dennis Tants <dennis.ta...@zarm.uni-bremen.de>:
>
> Hello Manuel,
>
> thank you very much for your insight.
>
> I have read about FairShare too, but I am not sure if/how I can use it
> for this specific example.
> But I will consider it!
> Or, can anyone confirm FairShare should work in this case?
>
>
> Just for clarification:
> Can't I use backfill and multifactor priority together?
> First goes backfill, depending on the job length and then it is sorted
> by priority?
> Atleast thats were my first thoughts about it :D
>
> Best regards,
> Dennis
>
> Am 08.08.2017 um 14:53 schrieb Manuel Rodríguez Pascual:
>> Hi Dennis,
>>
>> If I undersand you correctly, what you need is to use the Multifactor Plugin
>> https://slurm.schedmd.com/priority_multifactor.html
>>
>> In particular, I guess this is relevant for your installation:
>>
>> Note: Computing the fair-share factor requires the installation and
>> operation of the Slurm Accounting Database to provide the assigned
>> shares and the consumed, computing resources described below.
>>
>> and
>>
>> PriorityDecayHalfLife   This determines the contribution of historical
>> usage on the composite usage value. The larger the number, the longer
>> past usage affects fair-share. If set to 0 no decay will be applied.
>> This is helpful if you want to enforce hard time limits per
>> association. If set to 0 PriorityUsageResetPeriod must be set to some
>> interval. The unit is a time string (i.e. min, hr:min:00,
>> days-hr:min:00, or days-hr). The default value is 7-0 (7 days).
>>
>>
>> Also, note that backfill and multifactor are two different things:
>> Backfill tries to run small jobs if this does not affect to other ones
>> waiting on the queue;  Multifactor determines the order of the queue.
>> They are set with different parameters in slurm.conf
>>
>> SchedulerType=sched/backfill
>> PriorityType=priority/multifactor
>>
>>
>> I am not a slurm expert so I may probably be wrong on some
>> information, but I hope this can serve you as a small help on your
>> problem.
>>
>> Best,
>>
>>
>> Manuel
>>
>> 2017-08-08 14:37 GMT+02:00 Dennis Tants <dennis.ta...@zarm.uni-bremen.de>:
>>> Hey guys,
>>>
>>> I was asked to implement a walltime of 24 hours in our cluster (we only
>>> have 16 nodes, so no need until now).
>>> Furthermore, I need to prefer single accounts based on compute time. The
>>> example I was given is further below.
>>>
>>> Btw., I'm upgrading to SLURM 17.02.4.
>>>
>>> Example:
>>> If the partition is full, we should start to prefer people who haven't
>>> used much computing time the last month.
>>>
>>> I don't know where I should start with this request to be honest and if
>>> it is even possible (walltime is not the problem).
>>>
>>> First of, I would need to also implement accounting I guess (also, no
>>> need for until now^^)
>>> But how to continue? I wanted to implement backfill instead of FIFO, but
>>> would I also need multifactor priority?
>>>
>>> Here are my thoughts of being able to achieve this:
>>>
>>> 1. Could I do something like:
>>> A. People are getting points every month (which they can't actually
>>> empty within one month)
>>> B. Compare the points (last month) of all people when a queue appears.
>>>
>>> 2. Or something like (same but time based):
>>> A. Account the whole computing time for two months (and then start to
>>> delete old entries)
>>> B. While queue, compare computing time (last month) of all waiting users?
>>>
>>> I hope I made myself clear and that you can help me. Every hint is
>>> appreciated.
>>>
>>> Best regards,
>>> Dennis
>>>
>>> --
>>> Dennis Tants
>>> Auszubildender: Fachinformatiker für Systemintegration
>>>
>>> ZARM - Zentrum für angewandte Raumfahrttechnologie und Mikrogravitation
>>> ZARM - Center of Applied Space Technology and Microgravity
>>>
>>> Universität Bremen
>>> Am Fallturm
>>> 28359 Bremen, Germany
>>>
>>> Telefon: 0421 218 57940
>>> E-Mail: ta...@zarm.uni-bremen.de
>>>
>>> www.zarm.uni-bremen.de
>>>
>
> --
> Dennis Tants
> Auszubildender: Fachinformatiker für Systemintegration
>
> ZARM - Zentrum für angewandte Raumfahrttechnologie und Mikrogravitation
> ZARM - Center of Applied Space Technology and Microgravity
>
> Universität Bremen
> Am Fallturm
> 28359 Bremen, Germany
>
> Telefon: 0421 218 57940
> E-Mail: ta...@zarm.uni-bremen.de
>
> www.zarm.uni-bremen.de
>


[slurm-dev] Re: Account preferation

2017-08-08 Thread Manuel Rodríguez Pascual

Hi Dennis,

If I undersand you correctly, what you need is to use the Multifactor Plugin
https://slurm.schedmd.com/priority_multifactor.html

In particular, I guess this is relevant for your installation:

Note: Computing the fair-share factor requires the installation and
operation of the Slurm Accounting Database to provide the assigned
shares and the consumed, computing resources described below.

and

PriorityDecayHalfLife   This determines the contribution of historical
usage on the composite usage value. The larger the number, the longer
past usage affects fair-share. If set to 0 no decay will be applied.
This is helpful if you want to enforce hard time limits per
association. If set to 0 PriorityUsageResetPeriod must be set to some
interval. The unit is a time string (i.e. min, hr:min:00,
days-hr:min:00, or days-hr). The default value is 7-0 (7 days).


Also, note that backfill and multifactor are two different things:
Backfill tries to run small jobs if this does not affect to other ones
waiting on the queue;  Multifactor determines the order of the queue.
They are set with different parameters in slurm.conf

SchedulerType=sched/backfill
PriorityType=priority/multifactor


I am not a slurm expert so I may probably be wrong on some
information, but I hope this can serve you as a small help on your
problem.

Best,


Manuel

2017-08-08 14:37 GMT+02:00 Dennis Tants :
>
> Hey guys,
>
> I was asked to implement a walltime of 24 hours in our cluster (we only
> have 16 nodes, so no need until now).
> Furthermore, I need to prefer single accounts based on compute time. The
> example I was given is further below.
>
> Btw., I'm upgrading to SLURM 17.02.4.
>
> Example:
> If the partition is full, we should start to prefer people who haven't
> used much computing time the last month.
>
> I don't know where I should start with this request to be honest and if
> it is even possible (walltime is not the problem).
>
> First of, I would need to also implement accounting I guess (also, no
> need for until now^^)
> But how to continue? I wanted to implement backfill instead of FIFO, but
> would I also need multifactor priority?
>
> Here are my thoughts of being able to achieve this:
>
> 1. Could I do something like:
> A. People are getting points every month (which they can't actually
> empty within one month)
> B. Compare the points (last month) of all people when a queue appears.
>
> 2. Or something like (same but time based):
> A. Account the whole computing time for two months (and then start to
> delete old entries)
> B. While queue, compare computing time (last month) of all waiting users?
>
> I hope I made myself clear and that you can help me. Every hint is
> appreciated.
>
> Best regards,
> Dennis
>
> --
> Dennis Tants
> Auszubildender: Fachinformatiker für Systemintegration
>
> ZARM - Zentrum für angewandte Raumfahrttechnologie und Mikrogravitation
> ZARM - Center of Applied Space Technology and Microgravity
>
> Universität Bremen
> Am Fallturm
> 28359 Bremen, Germany
>
> Telefon: 0421 218 57940
> E-Mail: ta...@zarm.uni-bremen.de
>
> www.zarm.uni-bremen.de
>


[slurm-dev] multiple MPI versions with slurm

2017-07-21 Thread Manuel Rodríguez Pascual

Hi all,

I'm trying to provide support to users demanding different versions of
MPI, namely mvapich2 and OpenMPI. In particular, I'd like to support
"srun -n X ./my_parallel_app", maybe with "--with-mpi=X" flag.


Has anybody got any suggestion on how to do that? I'm am kind of new
to cluster administration, so probably there is some obvious solution
that I am not aware of.

Cheers,


Manuel


[slurm-dev] Re: different behaviour of signals with sbatch in different machines

2017-07-06 Thread Manuel Rodríguez Pascual

Hi all,

just in case anybody faces this problem at some point...

I found the solution with a set of good examples in
http://mywiki.wooledge.org/SignalTrap

applied to my problem (full code below), it comes to execute

"sh son.sh & wait $!"

in the father script.

Best regards,

Manuel

2017-07-04 15:55 GMT+02:00 Manuel Rodríguez Pascual
<manuel.rodriguez.pasc...@gmail.com>:
>
> Hi all,
>
> Developing a Slurm plugin I've come to a funny problem. I guess it is not 
> strictly related to Slurm but just system administration, but maybe someone 
> can point me on the right direction.
>
> I have 2 machines, one with CentOS 7 and one with BullX (based on CentOS6). 
> When I send a signal to finish a running tasks, the behaviours are different.
>
> It can be seen with 2 nested scripts, based on slurm_trap.sh by Mike Drake  
> (https://gist.github.com/MikeDacre/10ae23dcd3986793c3fd ). The code is at the 
> bottom of the mail. As can be seen, both father and son are capturing SIGTERM 
> and SIGKILL,. The execution consists on "father" calling "son", and "son" 
> waiting forever until it is killed.
>
>
> As you can see in the execution results (bottom of the mail), one of the 
> machines executes the functions stated in "trap", but the other does not. 
> Moreover, this second machine does execute the functions in trap when only a 
> single script is executed, not two nested ones.
>
> have you got an explanation for this? Is is possible to ensure that the 
> "trap" command will always be executed?
>
> Thanks for your help,
>
> Manuel
>
> -
> -
> -bash-4.2$ more father.sh
>
> #!/bin/bash
>
> trap_with_arg() {
> func="$1" ; shift
> for sig ; do
> trap "$func $sig" "$sig"
> done
> }
>
> func_trap() {
> echo father: trapped $1
> }
>
> trap_with_arg func_trap 0 1 USR1 EXIT HUP INT QUIT PIPE TERM
>
> cat /dev/zero > /dev/null &
>
> sh son.sh
> -bash-4.2$ more son.sh
> #!/bin/bash
>
>
> trap_with_arg() {
> func="$1" ; shift
> for sig ; do
> trap "$func $sig" "$sig"
> done
> }
>
> func_trap() {
> echo son: trapped $1
> }
>
> trap_with_arg func_trap 0 1 USR1 EXIT HUP INT QUIT PIPE TERM
>
> cat /dev/zero > /dev/null &
> wait
> -
> -
>
>
> Output in CentOS7:
> -bash-4.2$ sbatch  father.sh
> Submitted batch job 1563
> -bash-4.2$ scancel 1563
> -bash-4.2$ more slurm-1563.out
> slurmstepd: error: *** JOB 1563 ON acme12 CANCELLED AT 2017-07-04T15:39:00 ***
> son: trapped TERM
> son: trapped EXIT
> father: trapped TERM
> father: trapped EXIT
>
> Output in BullX:
> ~/signalTests> sbatch  father.sh
> Submitted batch job 233
> ~/signalTests> scancel 233
> ~/signalTests> more slurm-233.out
> slurmstepd: error: *** JOB 233 ON taurusi5089 CANCELLED AT 
> 2017-07-04T15:43:54 ***
>
> Output in BullX, just son:
> ~/signalTests> sbatch -- son.sh
> Submitted batch job 235
> ~/signalTests> scancel 235
> ~/signalTests> more slurm-235.out
> slurmstepd: error: *** JOB 235 ON taurusi4061 CANCELLED AT 
> 2017-07-04T15:48:29 ***
> son: trapped TERM
> son: trapped EXIT
>
>
>
>
>


[slurm-dev] different behaviour of signals with sbatch in different machines

2017-07-04 Thread Manuel Rodríguez Pascual
Hi all,

Developing a Slurm plugin I've come to a funny problem. I guess it is not
strictly related to Slurm but just system administration, but maybe someone
can point me on the right direction.

I have 2 machines, one with CentOS 7 and one with BullX (based on CentOS6).
When I send a signal to finish a running tasks, the behaviours are
different.

It can be seen with 2 nested scripts, based on slurm_trap.sh by Mike Drake
 (https://gist.github.com/MikeDacre/10ae23dcd3986793c3fd ). The code is at
the bottom of the mail. As can be seen, both father and son are capturing
SIGTERM and SIGKILL,. The execution consists on "father" calling "son", and
"son" waiting forever until it is killed.


As you can see in the execution results (bottom of the mail), one of the
machines executes the functions stated in "trap", but the other does not.
Moreover, this second machine does execute the functions in trap when only
a single script is executed, not two nested ones.

have you got an explanation for this? Is is possible to ensure that the
"trap" command will always be executed?

Thanks for your help,

Manuel

-
-
-bash-4.2$ more father.sh

#!/bin/bash

trap_with_arg() {
func="$1" ; shift
for sig ; do
trap "$func $sig" "$sig"
done
}

func_trap() {
echo father: trapped $1
}

trap_with_arg func_trap 0 1 USR1 EXIT HUP INT QUIT PIPE TERM

cat /dev/zero > /dev/null &

sh son.sh
-bash-4.2$ more son.sh
#!/bin/bash


trap_with_arg() {
func="$1" ; shift
for sig ; do
trap "$func $sig" "$sig"
done
}

func_trap() {
echo son: trapped $1
}

trap_with_arg func_trap 0 1 USR1 EXIT HUP INT QUIT PIPE TERM

cat /dev/zero > /dev/null &
wait
-
-


Output in CentOS7:
-bash-4.2$ sbatch  father.sh
Submitted batch job 1563
-bash-4.2$ scancel 1563
-bash-4.2$ more slurm-1563.out
slurmstepd: error: *** JOB 1563 ON acme12 CANCELLED AT 2017-07-04T15:39:00
***
son: trapped TERM
son: trapped EXIT
father: trapped TERM
father: trapped EXIT

Output in BullX:
~/signalTests> sbatch  father.sh
Submitted batch job 233
~/signalTests> scancel 233
~/signalTests> more slurm-233.out
slurmstepd: error: *** JOB 233 ON taurusi5089 CANCELLED AT
2017-07-04T15:43:54 ***

Output in BullX, just son:
~/signalTests> sbatch -- son.sh
Submitted batch job 235
~/signalTests> scancel 235
~/signalTests> more slurm-235.out
slurmstepd: error: *** JOB 235 ON taurusi4061 CANCELLED AT
2017-07-04T15:48:29 ***
son: trapped TERM
son: trapped EXIT


[slurm-dev] Re: How to print information message at submission time.

2017-06-20 Thread Manuel Rodríguez Pascual

It's maybe a bit of a hack, but I guess it could be done with a Spank
plugin. Just put whatever you want to print on the spank_init method,
as it is called every time a job is submitted. As a drawback, it would
also be printed with srun, and maybe when starting slurmctld (not a
big deal anyway).

2017-06-19 23:54 GMT+02:00 TO_Webmaster :
>
> Yes, unfortunately, this does not work out of the box. We now use the
> AdminComment field to save the information and a small wrapper script
> around sbatch that prints the admin comment field if submission
> succeeded.
>
> 2017-06-19 22:51 GMT+02:00 Marcin Stolarek :
>> I don't think it's possible to print a message if job is accepted to the
>> queue..
>>
>> 2017-06-19 18:16 GMT+02:00 :
>>>
>>> Hello,
>>>
>>> I'm using job_submit plugin (C langage) to manage users job submission on
>>> ours systems.
>>>
>>> I would like to print an information message at the terminal each time a
>>> job is submitted and I don't find how to do.
>>>
>>>
>>>
>>> It works fine in case of error message using err_msg parameter of the
>>> job_submit function :
>>>
>>> extern int job_submit(struct job_descriptor *job_desc, uint32_t
>>> submit_uid, char **err_msg)
>>>
>>>
>>>
>>>
>>> Best regards,
>>> Gerard Gil
>>>
>>>
>>> Centre Informatique National de l'Enseignement Superieur
>>> 950, rue de Saint Priest
>>> 34097 Montpellier CEDEX 5
>>> FRANCE
>>>
>>>
>>


[slurm-dev] thoughts on task preemption

2017-05-22 Thread Manuel Rodríguez Pascual
Hi all,

After working with the developers of DMTCP checkpoint library, we have a
nice working version of Slurm+DMTCP. We are able to checkpoint any batch
job (well, most of them) and restarting it anywhere else in the cluster. We
are testing it thoroughly, and will let you know in a few weeks in case any
of you is interested in testing/using it.

Anyway, after having this ready, we are working on some uses to this new
functionality. An interesting one is job preemption: if a job is running
but another with a higher priority comes, then checkpoint the first one,
cancel it, run the second, and restart the first one somewhere else.

This already counts with Slurm support, so it is kind of trivial from a
technical point of view. I am however not fully convinced on how this
should work. If possible, I'd like to have your thoughts as expert system
administrators.

A key thing is that, while we are able to checkpoint/restart most of the
jobs (both serial and MPI)  we are able to checkpoint only batch jobs: no
srun, no salloc. Also, jobs running on GPUs or Xeon Phi cannot currently be
checkpointed.

My question is, what should happen to this non-checkpointable jobs whenever
one with higher priority comes? One alternative should be to preempt only
jobs with checkpoint support, so no computation is lost; the other would be
to preempt whatever necessary to run the job as soon as possible, not
caring about being able to restore it later. I can imagine scenarios where
one of the alternatives is better than the other one, but I am not sure of
how realistic they are. As system administrators, would you have any
preference on this?

Also, the next question is what happens with the job to be restarted. With
current Slurm implementation it goes back to the queue. The problem it this
is that, if there are many jobs in the queue, this partially-completed one
will have to wait a lot before restarting. From my point of view it would
make sense to put it on top of the queue, so it restarts as soon as there
is a free slot. This can be easily changed in the code, but I'd love to
hear your point of view before modifying anything.

So, any ideas/suggestions?

Thanks for your help. Best regards,

Manuel


[slurm-dev] weird integration bewteen Spank and "--export"

2017-02-16 Thread Manuel Rodríguez Pascual

Hi all,

I am experiencing a strange behavior on a plugin that I created. i
don't know if this is what I should expect, so maybe you can provide
me with some insights.

I want to have some stuff done in slurm_spank_task_init, but ONLY if
sbatch/srun was executed with "--myplugin". To do so, I created a
global variable "activate_plugin_var" in my myplugin.c, which is set
in an auxiliary function "set_ativate_plugin_var". This funcion is
registered as a spank_option, so when the user uses "--myplugin" it is
called and this global variable activate_plugin_var  gets a value.
Then, in slurm_spank_task_init, I check the value of the variable and
if it is set, then I do whatever.

So far this works fine. I don't know however if this is the most
elegant solution though. Probably the best idea would do whatever in
the auxiliary set_ativate_plugin_var function  instead of employing
this flag. The problem is that I want to access the values of the
spank_t object received in slurm_spank_task_init and with the
registered function you don't have them, so that's why I came with
this approach. The main problem I see is that the state is not saved
"forever", so the value of any global variable is not always what
you'd expect.   If there is any better solution, I am open to
suggestions.

The weird situation is however that this funcionality is affected by
"--export" flag.

- if I don't use the export, it works as expected.
- if I use export with "export=ALL", it is OK too.
- if I use export with "export=NONE" or "export=myVar=myValue", then:
-- set_ativate_plugin_var is called
-- slurm_spank_task_init is called
-- slurm_spank_task_init does NOT have activate_plugin_var with the
correct value.

This causes that my plugin does not detect that the user used the
flag, and it does nothing. I would say that this is an inconsistency,
as the behavior of the plugin should not depend on an unrelated flag.

Reading the manual I saw that with the "export" options that cause an
error in my plugin "the --get-user-env option will implicitly be set".
I thought that this might be the cause of the problem, but additional
tests on that flag proved me wrong. I am now a bit lost, so any help
would be welcome.

I guess my question would be,

- have you noticed a similar behavior? Is it expected, or in fact an
inconsistency?
- if you have coded a similar spank plugin, have you got any
suggestion for its architecture?
- have you got any solution to my problem, maybe through a different
approach or whatever?


thanks,


Manuel


[slurm-dev] Re: Change in srun buffered output?

2017-01-10 Thread Manuel Rodríguez Pascual

Hi Andy,

I was facing the same issue, but I was worried that it was because
something missconfigured or whatever. Did you receive an
answer/explanation from anybody?


Cheers,


Manuel

2017-01-05 12:00 GMT-05:00 Andy Riebs :
>
> Hi Y'all,
>
> Historically, our users often use srun from their console windows so that
> they can watch the progress of their jobs. When we transitioned from 16.05.0
> to 16.05.7, we discovered that jobs submitted from the console with srun
> were buffering their output until the jobs completed, causing a number of
> jobs to be canceled because their users thought that they had hung!
>
> We have found that specifying "--unbuffered" or setting SLURM_UNBUFFEREDIO=1
> will remedy this problem, but we were surprised to encounter it in the first
> place. Was this an intentional change in functionality?
>
> Andy
>
> --
> Andy Riebs
> andy.ri...@hpe.com
> Hewlett-Packard Enterprise
> High Performance Computing Software Engineering
> +1 404 648 9024
> My opinions are not necessarily those of HPE
> May the source be with you!


[slurm-dev] Re: problem configuring mvapich + slurm: "error: mpi/pmi2: failed to send temp kvs to compute nodes"

2016-11-18 Thread Manuel Rodríguez Pascual
Hi,

You are both right :)  The problem is kind of solved now.

As Douglas and Jane stated,  after changing my SlurmSpoolDir to a local one
the error on the subject of this mail disappeared. I can now run  "srun -n
2 --tasks-per-node=1   ./helloWorldMPI" with no problem. However it does
not behave as expected (or at least as I would like to), as it creates a
job with just 1 task on each node instead of  a parallel one. This leads us
to the next point.

As Janne pointed out, my mvapich was not correctly compiled to support
srun. I managed to solve the compilation errors compiling with
"--with-pm=slurm" . The problem was basically not exporting
"/usr/local/lib" in LD_LIBRARY_PATH.

The problem with this new mvapich compilation is that "mpiexec" , "mpirun"
and all the similar commands related to mpi execution are not created, as
you are stating that srun will be used for that (as stated here
https://wiki.mpich.org/mpich/index.php/Frequently_Asked_Questions#Q:_What_are_process_managers.3F
). So altogether, you can either choose to execute mpi jobs with "mpiexec"
or with "srun". Is this correct, or am I missing something?

Thanks for your help and your fast support. best regards,


Manuel



2016-11-18 14:37 GMT+01:00 Douglas Jacobsen <dmjacob...@lbl.gov>:

> Hello,
>
> Is " /home/localsoft/slurm/spool" local to the node?  Or is it on the
> network?  I think each node needs to have separate data (like job_cred)
> stored there, and if each slurmd is competing for that file naming space I
> could imagine that srun could have problems.  I typically use
> /var/spool/slurmd.
>
> From the slurm.conf page:
>
> """
>
> *SlurmdSpoolDir* Fully qualified pathname of a directory into which the
> *slurmd* daemon's state information and batch job script information are
> written. This must be a common pathname for all nodes, but should represent
> a directory which is local to each node (reference a local file system).
> The default value is "/var/spool/slurmd". Any "%h" within the name is
> replaced with the hostname on which the *slurmd* is running. Any "%n"
> within the name is replaced with the Slurm node name on which the *slurmd*
> is running.
>
> """
>
> I hope that helps,
>
> Doug
> On 11/18/16 1:07 AM, Janne Blomqvist wrote:
>
> On 2016-11-17 12:53, Manuel Rodríguez Pascual wrote:
>
> Hi all,
>
> I keep having some issues using Slurm + mvapich2. It seems that I cannot
> correctly configure Slurm and mvapich2 to work together. In particular,
> sbatch works correctly but srun does not.  Maybe someone here can
> provide me some guidance, as I suspect that the error is an obvious one,
> but I just cannot find it.
>
> CONFIGURATION INFO:
> I am employing Slurm 17.02.0-0pre2 and mvapich 2.2.
> Mvapich is compiled with "--disable-mcast --with-slurm= location>"  <---there is a note about this at the bottom of the mail
> Slurm is compiled with no special options. After compilation, I executed
> "make && make install" in "contribs/pmi2/" (I read it somewhere)
> Slurm is configured with "MpiDefault=pmi2" in slurm.conf
>
> TESTS:
> I am executing a "helloWorldMPI" that displays a hello world message and
> writes down the node name for each MPI task.
>
> sbatch works perfectly:
>
> $ sbatch -n 2 --tasks-per-node=2 --wrap 'mpiexec  ./helloWorldMPI'
> Submitted batch job 750
>
> $ more slurm-750.out
> Process 0 of 2 is on acme12.ciemat.es <http://acme12.ciemat.es> 
> <http://acme12.ciemat.es>
> Hello world from process 0 of 2
> Process 1 of 2 is on acme12.ciemat.es <http://acme12.ciemat.es> 
> <http://acme12.ciemat.es>
> Hello world from process 1 of 2
>
> $sbatch -n 2 --tasks-per-node=1 -p debug --wrap 'mpiexec  ./helloWorldMPI'
> Submitted batch job 748
>
> $ more slurm-748.out
> Process 0 of 2 is on acme11.ciemat.es <http://acme11.ciemat.es> 
> <http://acme11.ciemat.es>
> Hello world from process 0 of 2
> Process 1 of 2 is on acme12.ciemat.es <http://acme12.ciemat.es> 
> <http://acme12.ciemat.es>
> Hello world from process 1 of 2
>
>
> However, srun fails.
> On a single node it works correctly:
> $ srun -n 2 --tasks-per-node=2   ./helloWorldMPI
> Process 0 of 2 is on acme11.ciemat.es <http://acme11.ciemat.es> 
> <http://acme11.ciemat.es>
> Hello world from process 0 of 2
> Process 1 of 2 is on acme11.ciemat.es <http://acme11.ciemat.es> 
> <http://acme11.ciemat.es>
> Hello world from process 1 of 2
>
> But when using more than one node, it fails. Below there is the
> experiment with a lo

[slurm-dev] problem configuring mvapich + slurm: "error: mpi/pmi2: failed to send temp kvs to compute nodes"

2016-11-17 Thread Manuel Rodríguez Pascual
Hi all,

I keep having some issues using Slurm + mvapich2. It seems that I cannot
correctly configure Slurm and mvapich2 to work together. In particular,
sbatch works correctly but srun does not.  Maybe someone here can provide
me some guidance, as I suspect that the error is an obvious one, but I just
cannot find it.

CONFIGURATION INFO:
I am employing Slurm 17.02.0-0pre2 and mvapich 2.2.
Mvapich is compiled with "--disable-mcast --with-slurm="
 <---there is a note about this at the bottom of the mail
Slurm is compiled with no special options. After compilation, I executed
"make && make install" in "contribs/pmi2/" (I read it somewhere)
Slurm is configured with "MpiDefault=pmi2" in slurm.conf

TESTS:
I am executing a "helloWorldMPI" that displays a hello world message and
writes down the node name for each MPI task.

sbatch works perfectly:

$ sbatch -n 2 --tasks-per-node=2 --wrap 'mpiexec  ./helloWorldMPI'
Submitted batch job 750

$ more slurm-750.out
Process 0 of 2 is on acme12.ciemat.es
Hello world from process 0 of 2
Process 1 of 2 is on acme12.ciemat.es
Hello world from process 1 of 2

$sbatch -n 2 --tasks-per-node=1 -p debug --wrap 'mpiexec  ./helloWorldMPI'
Submitted batch job 748

$ more slurm-748.out
Process 0 of 2 is on acme11.ciemat.es
Hello world from process 0 of 2
Process 1 of 2 is on acme12.ciemat.es
Hello world from process 1 of 2


However, srun fails.
On a single node it works correctly:
$ srun -n 2 --tasks-per-node=2   ./helloWorldMPI
Process 0 of 2 is on acme11.ciemat.es
Hello world from process 0 of 2
Process 1 of 2 is on acme11.ciemat.es
Hello world from process 1 of 2

But when using more than one node, it fails. Below there is the experiment
with a lot of debugging info, in case it helps.

(note that the job ID will be different sometimes as this mail is the
result of multiple submissions and copy/pastes)

$ srun -n 2 --tasks-per-node=1   ./helloWorldMPI
srun: error: mpi/pmi2: failed to send temp kvs to compute nodes
slurmstepd: error: *** STEP 753.0 ON acme11 CANCELLED AT
2016-11-17T10:19:47 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: acme11: task 0: Killed
srun: error: acme12: task 1: Killed


Slurmctld output:
slurmctld: debug2: Performing purge of old job records
slurmctld: debug2: Performing full system state save
slurmctld: debug3: Writing job id 753 to header record of job_state file
slurmctld: debug2: sched: Processing RPC: REQUEST_RESOURCE_ALLOCATION from
uid=500
slurmctld: debug3: JobDesc: user_id=500 job_id=N/A partition=(null)
name=helloWorldMPI
slurmctld: debug3:cpus=2-4294967294 pn_min_cpus=-1 core_spec=-1
slurmctld: debug3:Nodes=1-[4294967294] Sock/Node=65534 Core/Sock=65534
Thread/Core=65534
slurmctld: debug3:pn_min_memory_job=18446744073709551615
pn_min_tmp_disk=-1
slurmctld: debug3:immediate=0 features=(null) reservation=(null)
slurmctld: debug3:req_nodes=(null) exc_nodes=(null) gres=(null)
slurmctld: debug3:time_limit=-1--1 priority=-1 contiguous=0 shared=-1
slurmctld: debug3:kill_on_node_fail=-1 script=(null)
slurmctld: debug3:argv="./helloWorldMPI"
slurmctld: debug3:stdin=(null) stdout=(null) stderr=(null)
slurmctld: debug3:work_dir=/home/slurm/tests alloc_node:sid=acme31:11229
slurmctld: debug3:power_flags=
slurmctld: debug3:resp_host=172.17.31.165 alloc_resp_port=56804
other_port=33290
slurmctld: debug3:dependency=(null) account=(null) qos=(null)
comment=(null)
slurmctld: debug3:mail_type=0 mail_user=(null) nice=0 num_tasks=2
open_mode=0 overcommit=-1 acctg_freq=(null)
slurmctld: debug3:network=(null) begin=Unknown cpus_per_task=-1
requeue=-1 licenses=(null)
slurmctld: debug3:end_time= signal=0@0 wait_all_nodes=-1 cpu_freq=
slurmctld: debug3:ntasks_per_node=1 ntasks_per_socket=-1
ntasks_per_core=-1
slurmctld: debug3:mem_bind=65534:(null) plane_size:65534
slurmctld: debug3:array_inx=(null)
slurmctld: debug3:burst_buffer=(null)
slurmctld: debug3:mcs_label=(null)
slurmctld: debug3:deadline=Unknown
slurmctld: debug3:bitflags=0 delay_boot=4294967294
slurmctld: debug3: User (null)(500) doesn't have a default account
slurmctld: debug3: User (null)(500) doesn't have a default account
slurmctld: debug3: found correct qos
slurmctld: debug3: before alteration asking for nodes 1-4294967294 cpus
2-4294967294
slurmctld: debug3: after alteration asking for nodes 1-4294967294 cpus
2-4294967294
slurmctld: debug2: found 8 usable nodes from config containing
acme[11-14,21-24]
slurmctld: debug3: _pick_best_nodes: job 754 idle_nodes 8 share_nodes 8
slurmctld: debug5: powercapping: checking job 754 : skipped, capping
disabled
slurmctld: debug2: sched: JobId=754 allocated resources:
NodeList=acme[11-12]
slurmctld: sched: _slurm_rpc_allocate_resources JobId=754
NodeList=acme[11-12] usec=1340
slurmctld: debug3: Writing job id 754 to header record of job_state file
slurmctld: debug2: _slurm_rpc_job_ready(754)=3 usec=4
slurmctld: debug3: StepDesc: 

[slurm-dev] RE: Wrong behaviour of "--tasks-per-node" flag

2016-10-28 Thread Manuel Rodríguez Pascual
H,

After some searching into the code, I may have a clue of what is going on.

I have seen that the commit that launches the error is this one:

72ed146cd2a6facb76e854919fb887faf3fc0c25 (date May. 11th)

I have modified newest version of slurm,  src/srun/libsrun/opt.c (line
2279) to print values of opt

info("opt.ntasks=%u \n"
"opt.ntasks_per_node=%u\n"
"opt.min_nodes=%u\n"
"opt.ntasks_set=%d\n",

opt.ntasks,opt.ntasks_per_node,opt.min_nodes,opt.ntasks_set);


before the error condition, which is:

if ((opt.ntasks_per_node != NO_VAL) &&
(opt.ntasks_per_node != (opt.ntasks /
opt.min_nodes))) {
if (opt.ntasks > opt.ntasks_per_node)

info("Warning: can't honor
--ntasks-per-node "
 "set to %u which doesn't match the "
 "requested tasks %u with the number of
"
 "requested nodes %u. Ignoring "
 "--ntasks-per-node.",
opt.ntasks_per_node,
 opt.ntasks, opt.min_nodes);
opt.ntasks_per_node = NO_VAL;


Result shows the problem:

-bash-4.2$ sbatch --ntasks=16  --tasks-per-node=2  test.sh
-bash-4.2$ more slurm-475.out
srun: opt.ntasks=8
opt.ntasks_per_node=2
opt.min_nodes=8
opt.ntasks_set=1
srun: Warning: can't honor --ntasks-per-node set to 2 which doesn't match
the requested tasks 8 with the number of requested nodes 8. Ignoring
--ntasks-per-node.

-bash-4.2$ sbatch --ntasks=16  --tasks-per-node=4  test.sh
-bash-4.2$ more slurm-476.out
srun: opt.ntasks=4
opt.ntasks_per_node=4
opt.min_nodes=4
opt.ntasks_set=1
OK

-bash-4.2$ sbatch --ntasks=16  --tasks-per-node=8  test.sh
-bash-4.2$ more slurm-477.out
srun: opt.ntasks=2
opt.ntasks_per_node=8
opt.min_nodes=2
opt.ntasks_set=1
OK

-bash-4.2$ sbatch --ntasks=16  --tasks-per-node=16  test.sh
-bash-4.2$ more slurm-478.out
srun: opt.ntasks=1
opt.ntasks_per_node=16
opt.min_nodes=1
opt.ntasks_set=1
OK

The calculation of min_nodes is always correct after the specification of
task_per_node, so the error should not be arising.

Instead, if I do:

-bash-4.2$ sbatch --ntasks=16  --tasks-per-node=16 --nodes=2  test.sh
-bash-4.2$ more slurm-479.out
srun: opt.ntasks=2
opt.ntasks_per_node=16
opt.min_nodes=2
opt.ntasks_set=1
OK

In this case an error should arise, but it does not.

Altogether, I think the condition should be rewritten to something like:
if ((opt.ntasks_per_node != NO_VAL) &&
if (opt.ntasks < opt.ntasks_per_node
*opt.min_nodes )

but with opt.ntasks being the value introduced by the user, not the one
internally considered by Slurm at this point.  I don't know how to do
correct this, but I hope this helps to point towards the problem.

Best regards,


Manuel


[slurm-dev] Wrong behaviour of "--tasks-per-node" flag

2016-10-21 Thread Manuel Rodríguez Pascual
Hi all,

I am having the weirdest error ever.  I am pretty sure this is a bug. I
have reproduced the error in latest slurm commit (slurm 17.02.0-0pre2,
 commit 406d3fe429ef6b694f30e19f69acf989e65d7509 ) and in slurm 16.05.5
branch. It does NOT happen in slurm 15.08.12 .

My cluster is composed by 8 nodes, each with 2 sockets, each with 8 cores.
Slurm.conf content is

SchedulerType=sched/backfill
SchedulerPort=7321
SelectType=select/linear  #DEDICATED NODES
NodeName=acme[11-14,21-24] CPUs=16 Sockets=2 CoresPerSocket=8
ThreadsPerCore=1 State=UNKNOWN

I am running a simple hello World parallel code. It is submitted as "sbatch
--ntasks=X --tasks-per-node=Y myScript.sh ". The problem is that, depending
on the values of X and Y, Slurm performs a wrong opperation and returns an
error.

"
sbatch --ntasks=8 --tasks-per-node=2 myScript.sh
srun: Warning: can't honor --ntasks-per-node set to 2 which doesn't match
the requested tasks 4 with the number of requested nodes 4. Ignoring
--ntasks-per-node.
"
Note that  I did not request 4 but 8 tasks, and I did not request any
number of nodes.  Same happens with
"
sbatch --ntasks=16 --tasks-per-node=2 myScript.sh
srun: Warning: can't honor --ntasks-per-node set to 2 which doesn't match
the requested tasks 8 with the number of requested nodes 8. Ignoring
--ntasks-per-node.
"
and
"
sbatch --ntasks=32 --tasks-per-node=4 myScript.sh
srun: Warning: can't honor --ntasks-per-node set to 4 which doesn't match
the requested tasks 8 with the number of requested nodes 8. Ignoring
--ntasks-per-node.
"
All the rest of configurations work correctly and do not return any error.
In particular, I have tried the following combinations with no problem:
(ntasks, tasks-per-node)
(1,1)
(2,1), (2,2)
(4,1), (4,2), (4,4)
(8,1), (4,4), (8,8)
(16,4), (16,8), (16,16)
(32,8), (32,16)
(64,8), (64, 16)
(128, 16)

As said, this does not happen when executing the very same commands and
scripts with slurm 15.08.12. So, have you had any similar experiences? Is
this a bug, a desired behaviour, or am I doing something wrong?

Thanks for your help.

Best regards,



Manuel


[slurm-dev] Re: cons_res / CR_CPU - we don't have select plugin type 102

2016-10-04 Thread Manuel Rodríguez Pascual
Hi Jose,

I don't know if it's the case, but this error tends to arise after changing
configuration in slurmctld but not rebooting the compute nodes or having
there a different configuration. Have you double-checked this?

Best regards,

Manuel

El martes, 4 de octubre de 2016, Jose Antonio 
escribió:

>
> Hi,
>
> Currently I have set the SelectType parameter to "select/linear", which
> works fine. However, when a job is sent to a node, the job takes all the
> cpus of the machine, even if it only uses 1 core.
>
> That is why I changed SelectType to "select/cons_res" and its
> SelectTypeParameters to "CR_CPU", but this doesn't seem to work. If I
> try to send a task to a partition, which works with select/linear, the
> following message pops up:
>
> sbatch: error: slurm_receive_msg: Zero Bytes were transmitted or received
> sbatch: error: Batch job submission failed: Zero Bytes were transmitted
> or received
>
> The log in the server node (/var/log/slurmctld.log):
>
> error: we don't have select plugin type 102
> error: select_g_select_jobinfo_unpack: unpack error
> error: Malformed RPC of type REQUEST_SUBMIT_BATCH_JOB(4003) received
> error: slurm_receive_msg: Header lengths are longer than data received
> error: slurm_receive_msg [155.54.204.200:38850]: Header lengths are
> longer than data received
>
> There is no update in the compute node logs after this error comes up.
>
> Any ideas?
>
> Thanks,
>
> Jose
>


[slurm-dev] Re: new CRIU plugin

2016-08-31 Thread Manuel Rodríguez Pascual

Hi Chris,

At this moment, CRIU can only checkpoint serial applications. It is an
ongoing project so this may change in the future, but I am pretty
confident that will remain like this in the short and middle terms.
However, we are also working with the developers of DMTCP
(http://dmtcp.sourceforge.net/) on the Slurm driver. It is almost
finished, and a beta version is already being tested :)  DMTCP can
checkpoint parallel applications (I have tried MVAPICH, not sure if
OpenMPI right now) and GPUs are in their roadmap too, so may be useful
for you.

Anyway, I'll do a presentation on all this on the oncoming Slurm User
Meeting, so in a few weeks a will hopefully produce a PDF with a full
comparison among all them in terms of performance, requirements and
integration with Slurm.

Cheers,


Manuel



2016-08-31 4:02 GMT+02:00 Christopher Samuel <sam...@unimelb.edu.au>:
>
> On 30/08/16 22:11, Manuel Rodríguez Pascual wrote:
>
>> We hope that this can be useful for the Slurm community.
>
> That's really pretty neat!
>
> I can't test myself as we're stuck on RHEL6 for the moment but I do
> wonder if you've considered doing the same for Open-MPI so that Slurm
> can do checkpoint/resume for it in the same way it does for BLCR at the
> moment?
>
> All the best,
> Chris
> --
>  Christopher SamuelSenior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>  http://www.vlsci.org.au/  http://twitter.com/vlsci

[slurm-dev] new CRIU plugin

2016-08-30 Thread Manuel Rodríguez Pascual

Hi all,

After working together with CRIU ( https://criu.org/Main_Page )
developers, my team at CIEMAT has developed a CRIU plugin from Slurm.
This way, Slurm can employ this checkpoint/restart library to perform
these operations.

It is stored in my personal github account,
https://github.com/supermanue/slurm/tree/criuPlugin , as a branch of a
fairly new Slurm version.

Regarding the code, it is basically a clone of BLCR plugin modified to
CRIU requirements and functionality. It comprises:

- the plugin itself, stored in src/plugins/checkpoint/criu
- a new "--with-criu" compilation flag (plus the related files) so a
user can specify criu location if it is not the default one
- a modification in the SPANK behaviour (spank.h and plugstack.c) so a
spank plugin can get the location of the Slurm checkpoint folder
calling spank_get_item with "S_CHECKPOINT_DIR"
- some minor changes in other compilation-related files

We hope that this can be useful for the Slurm community.  Feel free to
test and use it :) And of course, any feedback (comment, criticism) is
welcome.

Thanks for your attention. Best regards,


Manuel


[slurm-dev] Re: jobs assigned to different cores than what Slurm thinks

2016-06-22 Thread Manuel Rodríguez Pascual
Hi Omer,

As a first step, I would update Slurm to the latest version. 2.6 is kind of
old, so maybe your problem was a bug has been solved by now.

Besides, could you post a bit more about your system (MPI library?) and
slurm.conf relevant information?

Cheers,


Manuel

2016-06-22 0:15 GMT+02:00 omer bromberg :

> I am relatively new to the Slurm system.
> We have a cluster of several hundreds cores connected with infiniband
> switch.
> I is run by a Slurm scheduler ver 2.6.5-1 installed on an Ubuntu 14.04 OS.
> Ever since we installed the system we are experiencing problems with
> oversubscription of
> cores on nodes.
>
> From what we've been able to figure out when we run  a parallel job and
> request for a number
> of cores, lets say 24:
> #SBATCH -n 24
>
> Slurm assign the right number of cores to the job and divide the cores
> between the nodes
> so that each node will not be overloaded. Our nodes have 16 cores each. So
> if node 1 is
> empty and node 2 has 8 free cores, in the output file it'll say that 16
> cores were assigned to node
> 1 and 8 cores to node 2. Therefore each node should have a load of 16.
>
> HOWEVER in practice each node gets a different number of cores. Node 1 can
> get only 8 cores
> leaving it half empty, while node 2 will get the rest of the 16 cores
> bringing it to a load of 28.
> We haven't figured out if there is any rule in the way the cores are
> actually divided or is it random.
> But it is definitely NOT how Slurm divides it and how it think it divides
> it.
>
> Any Idea how to resolve this issue?
> Thanks
> Omer
>


[slurm-dev] Re: Invalid user id when using Slurm API

2016-05-05 Thread Manuel Rodríguez Pascual

Hi all,

After looking at the problem with Sergio in a separate mail, we found
that the problem was "slurm_init_job_desc_msg" call. It sets default
values to all the objects of job_desc_msg_t, so it was overwriting
previous values. It has been moved just after the variable
declaration, and now everything works as expected.

I am writing this mail so anybody with the same problem landing on
this thread can read our solution.

Best regards,


Manuel

2016-05-03 12:03 GMT+02:00 Sergio Iserte :
> Hello everyone,
>
> I am developing an application that calls the Slurm API functions. In my
> first attemps I have been working with this code:
>>
>>
>> job_desc_msg_t job;
>> job.name=(char *)"new job";
>> job.min_nodes = 1;
>> job.user_id = getuid();
>> job.group_id = getgid();
>> slurm_init_job_desc_msg();
>> resource_allocation_response_msg_t* slurm_alloc_msg_ptr;
>> if (!(slurm_alloc_msg_ptr =slurm_allocate_resources_blocking(, 0,
>> NULL))) {
>>  slurm_perror ((char *)"slurm_allocate_resources_blocking error");
>>  exit (1);
>>  }
>
> Where I declare a new job, give some information and finally ask for
> resources.
>
> What the controller daemon writes in the log is:
>>
>>
>> slurmctld: debug2: sched: Processing RPC: REQUEST_RESOURCE_ALLOCATION from
>> uid=16777240
>> slurmctld: debug3: JobDesc: user_id=4294967294 job_id=N/A partition=(null)
>> name=(null)
>> slurmctld: debug3:cpus=-1-4294967294 pn_min_cpus=-1 core_spec=-1
>
>
> ... and it finishes with ...
>
>> slurmctld: _validate_job_desc: job failed to specify User
>> slurmctld: _slurm_rpc_allocate_resources: Invalid user id
>
>
> I do not know why the user_id changes when the message is going to be
> processed.
>
> Thank you.
>
> --
> Sergio Iserte
> High Performance Computing & Architectures (HPCA)
> Department of Computer Science and Engineering (DICC)
> Universitat Jaume I (UJI), Spain.
>


[slurm-dev] slurm_auth_unpack error: packed by slurmctld_p unpack by auth/munge

2016-04-20 Thread Manuel Rodríguez Pascual

Hi all,

working on a slurm plugin, I've come to this error.

"slurm_auth_unpack error: packed by slurmctld_p unpack by auth/munge"

it appears when I execute a simple slurm API call from my plugin,

job_info_msg_t * job_ptr;
uint16_t show_flags = 0;
if ((error = slurm_load_job (_ptr, job_id, show_flags)) !=
SLURM_SUCCESS )   <--- here

auth/munge is the kind of autorization I have configured. If I change
it, for example to auth/none, the error is similar, something like

"slurm_auth_unpack error: packed by slurmctld_p unpack by auth/none"

This is just at the beginning of the plugin.

I don't know what am I exactly doing wrong. Is it a compilation issue?
Should I initialize the plugin somehow? It does contain an "init"
method, but I am not calling it. Instead, I am just calling the
relevant function from an external application. I don't know whether
this is the correct behaviour though.  Also, I've placed my plugin on
src/plugins/slurmctld/myPlugin (it might be relevant, as error says
"packed with slurmctld..." ).

So, any clues or advices on what am I doing wrong?

Thanks for your help. Best regards,


Manuel


[slurm-dev] Re: Slurm Checkpoint/Restart example

2016-04-14 Thread Manuel Rodríguez Pascual

There is a good tutorial on how to use DMTCP on their github page,

https://github.com/dmtcp/dmtcp/blob/master/QUICK-START.md

I would start there. Anyway, probably this Slurm mailing list is not
the best place to ask for that information.

Best regards,

Manuel

2016-04-14 11:01 GMT+02:00 Husen R :
> Hi all,
> Thank you for your reply
>
> Danny :
> I have installed BLCR and SLURM successfully.
> I also have configured CheckpointType, --checkpoint, --checkpoint-dir and
> JobCheckpointDir in order for slurm to support checkpoint.
>
> I have tried to checkpoint a simple MPI parallel application many times in
> my small cluster, and like you said, after checkpoint is completed there is
> a directory named with jobid in  --checkpoint-dir. in that directory there
> is a file named "script.ckpt". I tried to restart directly using srun
> command below :
>
> srun --mpi=pmi2 --restart-dir=/mirror/source/cr/51 ./mm.o
>
> where --restart-dir is directory that contains "script.ckpt".
> Unfortunately, I got the following error :
>
> Failed to open(/mirror/source/cr/51/task.0.ckpt, O_RDONLY): No such file or
> directory
> srun: error: compute-node: task 0: Exited with exit code 255
>
> As we can see from the error message above, there was no "task.0.ckpt" file.
> I don't know how to get such file. The files that I got from checkpoint
> operation is a file named "script.ckpt" in --checkpoint-dir and two files in
> JobCheckpointDir named ".ckpt" and ".ckpt.old".
>
> According to the information in section srun in this link
> http://slurm.schedmd.com/checkpoint_blcr.html, after checkpoint is completed
> there should be checkpoint files of the form ".ckpt" and
> "..ckpt" in --checkpoint-dir.
>
> Any idea to solve this ?
>
> Manuel :
>
> Yes, BLCR doesn't support checkpoint/restart parallel/distributed
> application by itself (
> https://upc-bugs.lbl.gov/blcr/doc/html/FAQ.html#mpi). But it can be used by
> other software to do that (I hope the software is SLURM..huhu)
>
> I have ever tried to restart mpi application using DMTCP but it doesn't
> work.
> Would you please tell me how to do that ?
>
>
> Thank you in advance,
>
> Regards,
>
>
> Husen
>
>
>
>
>
> On Thu, Apr 14, 2016 at 12:03 PM, Danny Rotscher
>  wrote:
>>
>> I forgot something to add, you have to create a directory for the
>> checkpoint meta data, which is for default located in /var/slurm/checkpoint:
>> mkdir -p /var/slurm/checkpoint
>> chown -R slurm /var/slurm
>> or you define your own directory in slurm.conf:
>> JobCheckpointDir=
>>
>> The parameters you could check with:
>> scontrol show config | grep checkpoint
>>
>> Kind regards,
>> Danny
>> TU Dresden
>> Germany
>>
>> Am 14.04.2016 um 06:41 schrieb Danny Rotscher:
>>>
>>> Hello,
>>>
>>> we don't get it to work too, but we already build Slurm with the BLCR.
>>>
>>> You first have to install the BLCR library, which is described on the
>>> following website:
>>> https://upc-bugs.lbl.gov/blcr/doc/html/BLCR_Admin_Guide.html
>>>
>>> Then we build and installed Slurm from source and BLCR checkpointing has
>>> been included.
>>>
>>> After that you have to set at least one Parameter in the file
>>> "slurm.conf":
>>> CheckpointType=checkpoint/blcr
>>>
>>> It exists two ways to create ceckpointing, you could either make a
>>> checkpoint by the following command from outside your job:
>>> scontrol checkpoint create 
>>> or you could let Slurm do some periodical checkpoints with the following
>>> sbatch parameter:
>>> #SBATCH --checkpoint 
>>> We also tried:
>>> #SBATCH --checkpoint :
>>> e.g.
>>> #SBATCH --checkpoint 0:10
>>> to test it, but it doesn't work for us.
>>>
>>> We also set the parameter for the checkpoint directory:
>>> #SBATCH --checkpoint-dir 
>>>
>>> After you create a checkpoint and in your checkpoint directory is created
>>> a directory with name of your jobid, you could restart the job by the
>>> following command:
>>> scontrol checkpoint restart 
>>>
>>> We tested some sequential and openmp programs with different parameters
>>> and it works (checkpoint creation and restarting),
>>> but *we don't get any mpi library to work*, we already tested some
>>> programs build with openmpi and intelmpi.
>>> The checkpoint will be created but we get the following error when we
>>> want to restart them:
>>> - Failed to open file '/'
>>> - cr_restore_all_files [28534]:  Unable to restore fd 3 (type=1,err=-21)
>>> - cr_rstrt_child [28534]:  Unable to restore files!  (err=-21)
>>> Restart failed: Is a directory
>>> srun: error: taurusi4010: task 0: Exited with exit code 21
>>>
>>> So, it would be great if you could confirm our problems, maybe then
>>> schedmd higher up the priority of such mails;-)
>>> If you get it to work, please help us to understand how.
>>>
>>> Kind reagards,
>>> Danny
>>> TU Dresden
>>> Germany
>>>
>>> Am 11.04.2016 um 10:09 schrieb Husen R:

 Hi all,

 Based on the information in this link
 

[slurm-dev] Re: Slurm Checkpoint/Restart example

2016-04-14 Thread Manuel Rodríguez Pascual

Hi Danny, all,

As far as I know, unfortunately BLCR does not count with MPI support
At lest I haven't been able to achieve it.

On the other side, DMTCP ( http://dmtcp.sourceforge.net/ ) does work
with MPI. My team is very interested on counting with a reliable
checkpoint/restar mechanism in Slurm, so we are now plugin to
integrate it. We are facing some technical problems, but are working
together with  DMTCP team to solve them and we are confident on having
the integration ready soon.

anyway, i'll send a mail to this list when it's ready.

Cheers,


Manuel


2016-04-14 7:03 GMT+02:00 Danny Rotscher :
> I forgot something to add, you have to create a directory for the checkpoint
> meta data, which is for default located in /var/slurm/checkpoint:
> mkdir -p /var/slurm/checkpoint
> chown -R slurm /var/slurm
> or you define your own directory in slurm.conf:
> JobCheckpointDir=
>
> The parameters you could check with:
> scontrol show config | grep checkpoint
>
> Kind regards,
> Danny
> TU Dresden
> Germany
>
> Am 14.04.2016 um 06:41 schrieb Danny Rotscher:
>>
>> Hello,
>>
>> we don't get it to work too, but we already build Slurm with the BLCR.
>>
>> You first have to install the BLCR library, which is described on the
>> following website:
>> https://upc-bugs.lbl.gov/blcr/doc/html/BLCR_Admin_Guide.html
>>
>> Then we build and installed Slurm from source and BLCR checkpointing has
>> been included.
>>
>> After that you have to set at least one Parameter in the file
>> "slurm.conf":
>> CheckpointType=checkpoint/blcr
>>
>> It exists two ways to create ceckpointing, you could either make a
>> checkpoint by the following command from outside your job:
>> scontrol checkpoint create 
>> or you could let Slurm do some periodical checkpoints with the following
>> sbatch parameter:
>> #SBATCH --checkpoint 
>> We also tried:
>> #SBATCH --checkpoint :
>> e.g.
>> #SBATCH --checkpoint 0:10
>> to test it, but it doesn't work for us.
>>
>> We also set the parameter for the checkpoint directory:
>> #SBATCH --checkpoint-dir 
>>
>> After you create a checkpoint and in your checkpoint directory is created
>> a directory with name of your jobid, you could restart the job by the
>> following command:
>> scontrol checkpoint restart 
>>
>> We tested some sequential and openmp programs with different parameters
>> and it works (checkpoint creation and restarting),
>> but *we don't get any mpi library to work*, we already tested some
>> programs build with openmpi and intelmpi.
>> The checkpoint will be created but we get the following error when we want
>> to restart them:
>> - Failed to open file '/'
>> - cr_restore_all_files [28534]:  Unable to restore fd 3 (type=1,err=-21)
>> - cr_rstrt_child [28534]:  Unable to restore files!  (err=-21)
>> Restart failed: Is a directory
>> srun: error: taurusi4010: task 0: Exited with exit code 21
>>
>> So, it would be great if you could confirm our problems, maybe then
>> schedmd higher up the priority of such mails;-)
>> If you get it to work, please help us to understand how.
>>
>> Kind reagards,
>> Danny
>> TU Dresden
>> Germany
>>
>> Am 11.04.2016 um 10:09 schrieb Husen R:
>>>
>>> Hi all,
>>>
>>> Based on the information in this link
>>> http://slurm.schedmd.com/checkpoint_blcr.html,
>>> Slurm able to checkpoint the whole batch jobs and then Restart execution
>>> of
>>> batch jobs and job steps from checkpoint files.
>>>
>>> Anyone please tell me how to do that ?
>>> I need help.
>>>
>>> Thank you in advance.
>>>
>>> Regards,
>>>
>>>
>>> Husen Rusdiansyah
>>> University of Indonesia
>>
>>
>


[slurm-dev] Concurrence with Slurm: How can I force a job to be executed on a given node?

2016-04-12 Thread Manuel Rodríguez Pascual
Hi all,

I am dealing with job scheduling and concurrence in Slurm. I am a bit
stuck, so maybe someone can give me a hint on this.

I have a job that I want to be executed on a given node. I am performing
the submission process with an application using Slurm API. The steps I am
following so far are:

- I check if the job can run in a given node  with slurm_job_will_run. I
make sure that it is empty and job can be executed.

- I send the job to that node employing slurm_submit_batch_job command.

- If I am lucky, my job will start immediately

- I am not, in the meanwhile a job has been allocated to that node and my
job will have to wait.

As you can see, there is an obvious race condition here. The question is,
do you know any way of avoiding it?

A workaround that I've thought is to set the job the highest possible
priority, and forbid to use that priority on user submitted jobs.  It
doesn't look very elegant though. Other approach could be to make a
reservation and (if successful) then execute the job. This however would
-correct me if I'm wrong- have the drawback of having to delete the
reservation after the job execution. It is probably not a big deal, but
doesn't look like the best solution either.  All together, I keep having
the feeling that there is an obvious solution that I am not considering.


Any ideas or suggestions?

Thanks for your help,

Manuel
-- 
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN


[slurm-dev] Re: scch not found !!

2016-02-10 Thread Manuel Rodríguez Pascual

Hi David,

As can be read in man of slurm.conf,

checkpoint/blcr   Berkeley Lab Checkpoint Restart (BLCR).  NOTE: If a
file is found at sbin/scch (relative to the Slurm instal-

lation location), it will be executed
upon completion of the checkpoint. This can be a script used for
manag-

ing the checkpoint files.  NOTE:
Slurm’s BLCR logic only supports batch jobs.


As far as I know, if that file is not present, the system will
complain and stop the checkpoint. So the solution is to create an
empty shell script, in your case in /usr/sbin/scch, and it will stop
complaining.

Please try it, and let me know if it doesn't work.

Best regards,


Manuel



2016-02-10 14:25 GMT+01:00 David Roman <david.ro...@noveltis.fr>:
> Yes, I did it.
>
>
>
> BLCR installation : In blcr-0.8.5 I not found any file named scch
>
> tar xzf blcr-0.8.5.tar.gz
>
> cd blcr-0.8.5
>
> ./configure --enable-multilib=no
>
> make rpms
>
> cd rpm/RPMS/x86_64
>
> rpm -ivh blcr-0.8.5-1.x86_64.rpm
> blcr-modules_2.6.32_504.30.3.el6.x86_64-0.8.5-1.x86_64.rpm
> blcr-libs-0.8.5-1.x86_64.rpm blcr-devel-0.8.5-1.x86_64.rpm
>
>
>
>
>
> SLURM installation
>
> rpmbuild -ta --with blcr slurm-15.08.4.tar.bz2
>
> rpm -ivh slurm-plugins-15.08.4-1.el6.x86_64.rpm
> slurm-15.08.4-1.el6.x86_64.rpm slurm-devel-15.08.4-1.el6.x86_64.rpm
> slurm-munge-15.08.4-1.el6.x86_64.rpm slurm-perlapi-15.08.4-1.el6.x86_64.rpm
> slurm-sjobexit-15.08.4-1.el6.x86_64.rpm
> slurm-sjstat-15.08.4-1.el6.x86_64.rpm slurm-torque-15.08.4-1.el6.x86_64.rpm
> slurm-blcr-15.08.4-1.el6.x86_64.rpm
>
>
>
> If I untar slurm-15.08.4.tar.bz2, I not found any file named scch
>
>
>
>
>
> De : Manuel Rodríguez Pascual [mailto:manuel.rodriguez.pasc...@gmail.com]
> Envoyé : mercredi 10 février 2016 12:35
> À : slurm-dev <slurm-dev@schedmd.com>
> Objet : [slurm-dev] Re: scch not found !!
>
>
>
> hi,
>
>
>
> You have to install BLCR first. It is quite straightforward though. You can
> download it from the Berkley Lab web at
> http://crd.lbl.gov/departments/computer-science/CLaSS/research/BLCR/berkeley-lab-checkpoint-restart-for-linux-blcr-downloads/
>
>
>
>
>
>
>
>
>
> 2016-02-10 11:21 GMT+01:00 David Roman <david.ro...@noveltis.fr>:
>
> Hello,
>
>
>
> I try to use checkpoint in slurm. But I have an error about /usr/sbin/scch
> not found.
>
> I don’t know what I do to install this file ? Is it a part of BLCR or SLURM
> or other things ?
>
>
>
> Thanks a lot for your reply.
>
>
>
> David
>
>
>
>
>
> --
>
> Dr. Manuel Rodríguez-Pascual
> skype: manuel.rodriguez.pascual
> phone: (+34) 913466173 // (+34) 679925108
>
> CIEMAT-Moncloa
> Edificio 22, desp. 1.25
> Avenida Complutense, 40
> 28040- MADRID
> SPAIN



-- 
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN

[slurm-dev] Re: scch not found !!

2016-02-10 Thread Manuel Rodríguez Pascual
hi,

You have to install BLCR first. It is quite straightforward though. You can
download it from the Berkley Lab web at
http://crd.lbl.gov/departments/computer-science/CLaSS/research/BLCR/berkeley-lab-checkpoint-restart-for-linux-blcr-downloads/




2016-02-10 11:21 GMT+01:00 David Roman <david.ro...@noveltis.fr>:

> Hello,
>
>
>
> I try to use checkpoint in slurm. But I have an error about /usr/sbin/scch
> not found.
>
> I don’t know what I do to install this file ? Is it a part of BLCR or
> SLURM or other things ?
>
>
>
> Thanks a lot for your reply.
>
>
>
> David
>



-- 
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN


[slurm-dev] Re: Slurm with BLCR and slurmctld.log error

2015-12-14 Thread Manuel Rodríguez Pascual
Hi Alvaro,

Have you compiled Slurm with "--with-blcr" flag?  When configuring with
"./configure --with-blcr"  (plus the rest of your flags) you should get
something like

(...)
checking for blcr installation... /YOUR/PATH

and then make && make install

As far as I know, that would mean that BLCR plugin has been compiled and th
executables correctly detected.


Best regards,


Manuel





2015-12-14 15:43 GMT+01:00 alvaro gamboa <gamboafaceb...@gmail.com>:

> Hello,
>
> I'm using Slurm with Debian :
>
> *Linux version 3.16.0-4-686-pae (debian-ker...@lists.debian.org
> <debian-ker...@lists.debian.org>) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1
> SMP Debian 3.16.7-ckt17-1 (2015-09-26)*
>
> I want to use the method BLCR but I get this error in slurmctld.log:
>
> *[2015-12-10T19:48:55.844] slurmctld version 14.03.9 started on cluster
> (null)*
> *[2015-12-10T19:48:55.857] error: Couldn't find the specified plugin name
> for checkpoint/blcr looking at all files*
> *[2015-12-10T19:48:55.857] error: cannot find checkpoint plugin for
> checkpoint/blcr*
> *[2015-12-10T19:48:55.858] error: cannot create checkpoint context for
> checkpoint/blcr*
> *[2015-12-10T19:48:55.858] fatal: failed to initialize checkpoint plugin*
>
> Slurm installed packets:
>
> *ii  slurm-client  14.03.9-5
>  i386 SLURM client side commands*
> *ii  slurm-llnl14.03.9-5
>  all  transitional dummy package for slurm-wlm*
> *ii  slurm-wlm 14.03.9-5
>  i386 Simple Linux Utility for Resource Management*
> *ii  slurm-wlm-basic-plugins   14.03.9-5
>  i386 SLURM basic plugins*
> *ii  slurmctld 14.03.9-5
>  i386 SLURM central management daemon*
> *ii  slurmd14.03.9-5
>  i386 SLURM compute node daemon*
>
> BLCR installed packets:
>
> *ii  blcr-testsuite0.8.5-2.2
>  i386 Userspace tools to Checkpoint and Restart Linux processes*
> *ii  blcr-util 0.8.5-2.2
>  i386 Userspace tools to Checkpoint and Restart Linux processes*
>
> Also *libcr-dev*, *libcr-dbg* and *libcr0*.
>
> I think the solution is simple but took hours trying to fix it and don't
> get it
>
> Please help me, thank you
>



-- 
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN


[slurm-dev] weird error (bug?) on srun (16.05.0-0pre1)

2015-11-24 Thread Manuel Rodríguez Pascual
unge.so
7fc1ff55b000-7fc1ff55c000 rw-p 3000 00:22 1228
  /home/localsoft/slurm/lib/slurm/auth_munge.so
7fc1ff55c000-7fc1ff574000 r-xp  00:22 29464
  /home/localsoft/slurm/lib/slurm/mpi_pmi2.so
7fc1ff574000-7fc1ff773000 ---p 00018000 00:22 29464
  /home/localsoft/slurm/lib/slurm/mpi_pmi2.so
7fc1ff773000-7fc1ff774000 r--p 00017000 00:22 29464
  /home/localsoft/slurm/lib/slurm/mpi_pmi2.so
7fc1ff774000-7fc1ff775000 rw-p 00018000 00:22 29464
  /home/localsoft/slurm/lib/slurm/mpi_pmi2.so
7fc1ff775000-7fc1ff77a000 r-xp  00:22 1057
  /home/localsoft/slurm/lib/slurm/launch_slurm.so
7fc1ff77a000-7fc1ff97a000 ---p 5000 00:22 1057
  /home/localsoft/slurm/lib/slurm/launch_slurm.so
7fc1ff97a000-7fc1ff97b000 r--p 5000 00:22 1057
  /home/localsoft/slurm/lib/slurm/launch_slurm.so
7fc1ff97b000-7fc1ff97c000 rw-p 6000 00:22 1057
  /home/localsoft/slurm/lib/slurm/launch_slurm.so
7fc1ff97c000-7fc1ff97e000 r-xp  00:22 23344
  /home/localsoft/slurm/lib/slurm/switch_none.so
7fc1ff97e000-7fc1ffb7e000 ---p 2000 00:22 23344
  /home/localsoft/slurm/lib/slurm/switch_none.so
7fc1ffb7e000-7fc1ffb7f000 r--p 2000 00:22 23344
  /home/localsoft/slurm/lib/slurm/switch_none.so
7fc1ffb7f000-7fc1ffb8 rw-p 3000 00:22 23344
  /home/localsoft/slurm/lib/slurm/switch_none.so
7fc1ffb8-7fc1ffb8f000 r-xp  00:22 23335
  /home/localsoft/slurm/lib/slurm/select_linear.so
7fc1ffb8f000-7fc1ffd8e000 ---p f000 00:22 23335
  /home/localsoft/slurm/lib/slurm/select_linear.so
7fc1ffd8e000-7fc1ffd8f000 r--p e000 00:22 23335
  /home/localsoft/slurm/lib/slurm/select_linear.so
7fc1ffd8f000-7fc1ffd9 rw-p f000 00:22 23335
  /home/localsoft/slurm/lib/slurm/select_linear.so
7fc1ffd9-7fc1ffd9b000 r-xp  08:17 67110724
  /usr/lib64/libnss_files-2.17.so
7fc1ffd9b000-7fc1fff9a000 ---p b000 08:17 67110724
  /usr/lib64/libnss_files-2.17.so
7fc1fff9a000-7fc1fff9b000 r--p a000 08:17 67110724
  /usr/lib64/libnss_files-2.17.so
7fc1fff9b000-7fc1fff9c000 rw-p b000 08:17 67110724
  /usr/lib64/libnss_files-2.17.soAbortado (`core' generado)

---
---

It is important to note that if mpi is disabled the execution suceeds.
Moreover, it has the same behaviour when using any option listed with
"--mpi=list"

srun --mpi=none  -n 2 --cpus-per-task=1 --ntasks-per-node=1 ./helloWorldMPI

Of course this is not a solution, as this creates 3 jobs with 1 thread
instead of 1 job with 3 threads, but maybe helps finding the problem.


I assumed that it was a problem of my configuration. However, I tried
downgrading Slurm to the previous release version (slurm 15.08.4) ,
and now it works fine.

Summing up,

-in virtual environment, slurm 16.05.0-0pre1, srun and sbatch work
-in real environment, slurm 15.08, srun and sbatch work
-in real environment, slurm 16.05.0-0pre1, sbatch works, srun DOES NOT work.

Any hints?


Thanks for your help,


Manuel



-- 
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN

[slurm-dev] SPANK behavior and functionalities

2015-10-14 Thread Manuel Rodríguez Pascual
Hi all,

I have a couple of questions about SPANK plugins development that I hope
you can solve. They are kind of newbie and easy ones, so I've grouped them
in a single mail.

- Is it possible to modify the execution ARGV of an application from inside
a SPANK plugin? Like,

-> user call: srun --spank_plugin myCode
->SPANK: add "myWrapper"
->executed by srun: myWrapper myCode.

I've tried by modifying the item returned by a "spank_get_item (sp,
S_JOB_ARGV,...) call in slurm_spank_task_init, but the changes won't be
reflected on the execution. Any suggestion?

-Is it possible to know whether the user executed srun/sbatch with a given
parameter created by a SPANK plugin? I mean, to know whether the user
executed
srun --spank_plugin myCode
or
srun myCode ?

My current approach would be to create a global variable in the SPANK
plugin,
static int spank_plugin =1; //did the user say --spank_plugin?

and set it to zero in the function defined in spank_option. This way, id
the user

That does work for some time, because if in slurm_spank_task_init I start
with
  if (spank_plugin !=0)//no spank_plugin, exit
return (0);

it would behave as desired.

However, in the slurm_spank_job_epilog the behavior will be different, as
the variable always has the default value, so I guess I am missing
something.

besides this particular case, it would be really useful for me if some kind
of state can be maintained along the execution of the job in SPANK plugin,
like a global variable accesible by all the methods storing a value and so.


Thanks for your help,

Manuel




-- 
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN


[slurm-dev] Implementing slurm integration with DMTCP: some general questions

2015-09-24 Thread Manuel Rodríguez Pascual
Hi all,

I am currently implementing the plugin to connect DMTCP checkpoint library
with Slurm. I have however some design questions that you may help me to
solve.

1.-

DMTCP basically acts as a wrapper to whatever you are executing (this is a
bit of  a simplification). So if you want to run X, you have in fact to
execute "dmtcp_launch X".

In the case of srun, it is not a problem. I have created a "srun_dmtcp"
call, analogue to "srun_blcr", that does the work by effectively wrapping
the user call. So now the user have "srun" for non checkpointed executions,
and "srun_dmtcp" for checkpointed ones.

The problem comes with "sbatch". I assume that users may or may not to
employ DMTCP, so a modification of the call behaviour to include this
wrapper seems too aggressive. I've thought that maybe the best idea is to
include a flag, "use-dmtcp" or something similar, so users can decide what
to do. What do you think? As a drawback, I have looked at the code, and I
think that this will require pretty heavy modification at all levels: some
data structures will have to be modified, so new inicializations, data
packing/unpacking and so will be required. It is not a problem to implement
this, but maybe there are some alternatives I have not though about.


2.-

DMTCP controller (the application in charge of everything) requires a
communication port. A default one can be employed, as well as a
user-defined one. Here, the same question arises.  I have modified the code
so DMTCP port can be set on slurm.conf, and then exported as a  environment
variable on slurm executions. This does however requires quit a lot of
updates, and probably will not be employed by the majority of users. The
alternative is to hardcode a port in DMTCP controller, avoiding Slurm
modifications and the environment variable, but might cause issues in some
systems, I guess. This port is defined on a shell script, so it is in fact
easy to modify if needed.  Which alternative do you guys think it is more
convenient? Have you got any alternative? I am new on Slurm development, so
there is probably a simpler and better way of doing things.

Thanks for your help. Best regards,


Manuel



-- 
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN


[slurm-dev] Re: Now I have this error with libltdl.so.7

2015-09-18 Thread Manuel Rodríguez Pascual
Hi Fany,

It looks like a problem with your MPI configuration, not Slurm. As a first
test, are you able to run the application without Slurm?

You can try

./yourApplication

to run it with a single MPI task,  and

mpiexec -N 2 ./yourApplication   (or something similar; this is what I do
with MPICH)

to run it with two tasks. Do it on your master node and on the computing
elements, so you can discard configuration problems external to Slurm.



2015-09-17 20:18 GMT+02:00 Fany Pagés Díaz <fpa...@udio.cujae.edu.cu>:

>
>
> Hello,
>
>
>
> I'm trying to run the job and I get this error.
>
>
>
> [root@cluster bin]# srun scriptmpi
> /usr/bin/mpi: error while loading shared libraries: libltdl.so.7: cannot
> open shared object file: No such file or directory
> srun: error: compute-0-0: task 0: Exited with exit code 127
> srun: error: compute-0-0: task 0: Exited with exit code 127
>
>
>
>
>
> And this is my script
>
>
>
> #!/bin/bash
>
>
>
> #SBATCH --job-name="mpi"
>
> #SBATCH --partition="cluster"
>
> #SBACTH --nodes=compute-0-0,compute-0-1,compute-0-2
>
> #SBATCH -n 3
>
> #SBATCH --output=test-srun.out
>
> #SBATCH --error=test-srun.err
>
> source /etc/profile
>
> module load openmpi-x86_64
>
>
>
> srun mpi
>
>
>
> Thanks,
>
> Ing.Fany Pagés Díaz
>
>
>
>
>
>
>
>
>
>
>



-- 
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN


[slurm-dev] Re: Simulator sync issues

2015-08-05 Thread Manuel Rodríguez Pascual
Hi Gonzalo,

I am not an expert, so please take the rest of the mail as an opinion and
not completely reliable information.

As far as I know, Slurm simulator was developed by a single guy, and it was
abandoned at some point. Now  it is not supported anymore.  It was
developed for some old branch of Slurm. As Slurm has been deeply modified
from then, it is now not usable.

Marina Zapater has collected some information about it and put into her
github.

https://github.com/marinazapater/slurm-sim

there you can download the simulator, input data and the correct Slurm
version to make it work. It is designed to be executed on Ubuntu
12.something or 14 (TLS), and I also don't know whether it runs on any
other distro. I think this is currently the best starting point to get a
running simulator.

The code is however not perfect. It still presents some scalability issues
(memory leaks? concurrency?) that make it fail when executing large
simulations in terms of nodes or tasks. There is also no documentation at
all, besides a high level description.

I am just starting to get familiar with the simulator, so I cannot give you
any more in-depth information. Also, I would welcome any more information,
current versions or whatever you can find about this, so please feel free
to submit any update to this list (or myself).


Best regards,

2015-08-03 22:26 GMT+02:00 Gonzalo Rodrigo Alvarez gprodrigoalva...@lbl.gov
:



 Good afternoon,

 I have set up the simulator-branch from the  SchedMd fork and I have been
 experiencing some issues with using the simulator. Some I could solve
 myself, but I am having trouble with them, if anybody had similar
 experiences, I think it would be a good thing for all to share. Let's start
 first with the one I have not been able to solve:

 When a group  of jobs run longer than their runtime, slurmctld sends the
 corresponding REQUEST_KILL_TIMELIMIT rpc, which triggers the creation of
 a number of threads. The first one arrives to slurmd. But It tries to
 create more than the available proto-threads and for some reason this leads
 to slurmctld to block. Any hint on this problem? Maybe I should run it on a
 bigger VM? Now I am using a single core VM,

 Now a list of things that I observed I could solve:
 - I observed that unless I would add a sleep(1) in some threads derived
 from slurmctld, newly created threads would make
 _checking_for_new_threads go on an infinite loop. In particular: agent
 thread, backfil agent, _slurmctld_rpc_mgr loop, _slurmctld_background loop,
 and in general all the agent loops of the plugins used. (I know it is not a
 clean solution, but it worked).
 - In sim_lib: I observed that get_new_thread_id return type was changed
 form int to uint. that broke the code in pthread_create that detects the
 case in which there are no available threads (it returns -1).
 - I had to re-write the way the sleep wrapper and the _time_mgr were
 communicating with thread_sem and thread_sem_back. As it is it kept
 blocking all the time.

 Encountering these problems made me wonder if I am working with the
 correct branch (schedmd/simulator). or that the evolution of slurm is
 making the simulator code rot.  When the solutions are more stable and
 clean I will do a patch and poste it here

 Also a note to someone who was asking about the test.traces file (I am
 answering here because the google groups interface would not allow me to
 answer to it): you can find it, together with a synthetic user list at the
 original tar file distributed by the BSC, just put it in the sbin dir:
 http://www.bsc.es/marenostrum-support-services/services/slurm-simulator

 Thanks in advance!

 //Gonzalo





-- 
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN


[slurm-dev] Re: Simulator sync issues

2015-08-05 Thread Manuel Rodríguez Pascual
Sounds great, thanks for the update :)

My team is now preparing a trace extracted from 5 years of usage of our
cluster (2000 cores, 15M jobs), so as soon as the simulator -old or new- is
working, it will have some real stuff to simulate. We will of course
liberate it and post the download link here.

Best regards,


Manuel

El miércoles, 5 de agosto de 2015, Gonzalo Rodrigo Alvarez 
gprodrigoalva...@lbl.gov escribió:

 Hi Manuel,

 Thanks you the info and the link, I will try with that version and compare.

 So far, I got the same feeling with the code but I have been able to
 progress. Right now I don't have deadlocks, for that I did the following:
 1) Re-do the way thread_sem and thread_sem_back  communicate beween
 sim_mgr and the sleep wrapper.
 2) Update (BTW it does not get installed) rpc_threads.pl: the script that
 identifies the the function calls associated with threads that the
 simulator should ignore. With the correct list I could remove the sleeps
 that I added in some scripts. It covers:
 - from binary slurmctld: _slurmctld_rpc_mgr, _slurmctld_signal_hand,
 _agent,  agent,
 _wdog, _thread_per_group_rpc
 - from binary accounting_storage_slurmdbd.so: _set_db_inx_thread, _agent

 The simulator seems to work without blocking, However, there are some
 issues with the way the actual duration of jobs is communicated to the
 slurmd:
 - First sim_mgr sends a REQUEST_SIM_JOB RPC to slurmd with a job-id and a
 job duration.
 - Then it uses sbatch, to submit the job.
 The problem is that the first job-id si calculated by taking into account
 sbatch fails, and there are cases that sbatch reports failure but the job
 gets submitted. IN the end this results into slurmd having non coherent
 information associated to the sam job-id (or no information at all).  I
 will keep updating in case someone else is trying to bring the simulator to
 life.


 Thanks!

 //Gonzalo








 On Wed, Aug 5, 2015 at 7:25 AM, Manuel Rodríguez Pascual 
 manuel.rodriguez.pasc...@gmail.com
 javascript:_e(%7B%7D,'cvml','manuel.rodriguez.pasc...@gmail.com');
 wrote:

 Hi Gonzalo,

 I am not an expert, so please take the rest of the mail as an opinion and
 not completely reliable information.

 As far as I know, Slurm simulator was developed by a single guy, and it
 was abandoned at some point. Now  it is not supported anymore.  It was
 developed for some old branch of Slurm. As Slurm has been deeply modified
 from then, it is now not usable.

 Marina Zapater has collected some information about it and put into her
 github.

 https://github.com/marinazapater/slurm-sim

 there you can download the simulator, input data and the correct Slurm
 version to make it work. It is designed to be executed on Ubuntu
 12.something or 14 (TLS), and I also don't know whether it runs on any
 other distro. I think this is currently the best starting point to get a
 running simulator.

 The code is however not perfect. It still presents some scalability
 issues (memory leaks? concurrency?) that make it fail when executing large
 simulations in terms of nodes or tasks. There is also no documentation at
 all, besides a high level description.

 I am just starting to get familiar with the simulator, so I cannot give
 you any more in-depth information. Also, I would welcome any more
 information, current versions or whatever you can find about this, so
 please feel free to submit any update to this list (or myself).


 Best regards,

 2015-08-03 22:26 GMT+02:00 Gonzalo Rodrigo Alvarez 
 gprodrigoalva...@lbl.gov
 javascript:_e(%7B%7D,'cvml','gprodrigoalva...@lbl.gov');:



 Good afternoon,

 I have set up the simulator-branch from the  SchedMd fork and I have
 been experiencing some issues with using the simulator. Some I could solve
 myself, but I am having trouble with them, if anybody had similar
 experiences, I think it would be a good thing for all to share. Let's start
 first with the one I have not been able to solve:

 When a group  of jobs run longer than their runtime, slurmctld sends the
 corresponding REQUEST_KILL_TIMELIMIT rpc, which triggers the creation of
 a number of threads. The first one arrives to slurmd. But It tries to
 create more than the available proto-threads and for some reason this leads
 to slurmctld to block. Any hint on this problem? Maybe I should run it on a
 bigger VM? Now I am using a single core VM,

 Now a list of things that I observed I could solve:
 - I observed that unless I would add a sleep(1) in some threads
 derived from slurmctld, newly created threads would make
 _checking_for_new_threads go on an infinite loop. In particular: agent
 thread, backfil agent, _slurmctld_rpc_mgr loop, _slurmctld_background loop,
 and in general all the agent loops of the plugins used. (I know it is not a
 clean solution, but it worked).
 - In sim_lib: I observed that get_new_thread_id return type was changed
 form int to uint. that broke the code in pthread_create that detects the
 case in which there are no available threads

[slurm-dev] Re: Messing with job checkpointing

2015-06-08 Thread Manuel Rodríguez Pascual
That's exactly what I was looking for, thanks very much.


2015-06-02 16:30 GMT+02:00 Moe Jette je...@schedmd.com:


 See the MinJobAge configuration option:
 http://slurm.schedmd.com/slurm.conf.html


 Quoting Manuel Rodríguez Pascual manuel.rodriguez.pasc...@gmail.com:

 Hi all,


 I have been performing some more tests trying to understand the slurm
 internals and to reduce the checkpoint/restart time.

 Looking into the job status with slurm_print_job_info, I have observed
 that
 it remains on RUNNING status for about 5 minutes after a
 slurm_checkpoint_vacate.

 JobId=2133 JobName=variableSizeTester.sh
UserId=slurm(500) GroupId=slurm(1000)
Priority=4294901754 Nice=0 Account=(null) QOS=(null)
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:05:39 TimeLimit=UNLIMITED TimeMin=N/A
SubmitTime=2015-06-02T06:43:16 EligibleTime=2015-06-02T06:43:16
StartTime=2015-06-02T06:43:17 EndTime=Unknown
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=debug AllocNode:Sid=slurm-master:2951
 (...)


 So when calling slurm_checkpoint_restart, slurmctld complains with

 attempt re-use active job_id 2133
 slurm_rpc_checkpoint restart 2133: Duplicate job id


 and same error is obtained until the aforementioned 5 minute limit, then
 the job record is released, cleaned

 slurmctld: debug2: Purging old records
 slurmctld: debug2: purge_old_job: purged 1 old job records

 and the checkpoint can then be restarted.

 I have tried calling purge_old_job() to reduce this time but it does not
 work, so I assume that the problem is that the job is considered to be
 running and not a missinformation by slurmctld. Also, there is no query
 from slurmctld to the compute element, this seems to be some kind of
 internal timeout or something like that. Am I right?

 My question is then, cannot be this time reduced somehow? Is there any
 particular reason why the job is considered as active by Slurmctld for
 like
 5 minutes after its checkpoint and cancellation?

 Thanks for your attention.


 Best regards,


 Manuel








 2015-05-29 18:00 GMT+02:00 Manuel Rodríguez Pascual 
 manuel.rodriguez.pasc...@gmail.com:

   Hi all,

 I have been messing around a little bit with task checkpoint/restart.

 I am employing BLCR to checkpoint a fairly small application with
 slurm_checkpoint_vacate, what should take several seconds. However, when
 I
 try to restart it with slurm_checkpoint_restart, the process is very
 slow.

 Looking at the output of slurmtcld ,  what I get is

 
 slurmctld: debug2: Processing RPC: REQUEST_CHECKPOINT(restart) from uid=0
 slurmctld: attempt re-use active job_id 2110
 slurmctld: _slurm_rpc_checkpoint restart 2110: Duplicate job id
 

 if I continue performing the same call, the output is identical for some
 time, until slurm cleans its internal structures (or something like
 that),
 writing in the log

 
 slurmctld: debug2: Testing job time limits and checkpoints
 slurmctld: debug2: Performing purge of old job records
 slurmctld: debug2: purge_old_job: purged 1 old job records
 slurmctld: debug:  sched: Running job scheduler
 slurmctld: debug:  backfill: beginning
 slurmctld: debug:  backfill: no jobs to backfill
 

 then, the next call to slurm_checkpoint_restart succeeds,  with

 
 slurmctld: debug2: Processing RPC: REQUEST_CHECKPOINT(restart) from uid=0
 slurmctld: debug2: found 9 usable nodes from config containing
 slurm-compute[1-9]
 slurmctld: debug2: sched: JobId=2110 allocated resources: NodeList=(null)
 slurmctld: _slurm_rpc_checkpoint restart for 2110 usec=909
 slurmctld: debug2: Testing job time limits and checkpoints
 slurmctld: debug:  backfill: beginning
 slurmctld: debug2: backfill: entering _try_sched for job 2110.
 slurmctld: debug2: found 2 usable nodes from config containing
 slurm-compute[1-9]
 slurmctld: backfill: Started JobId=2110 on slurm-compute2
 


 I am wondering why is all this necessary. Why can't the vacate call
 delete everything related to the job, so it can be restarted immediately?
 If there is any particular reason that makes that impossible, why cannot
 the Slurm structures be cleaned (purged or whatever) every 10 seconds or
 so, instead of once every 5-10 minutes? Does it cause a significant
 overhead or scalability issue? Or as an alternative,  is there any API
 call
 that can be employed to trigger that purge?


 Thanks for your help,


 Manuel






 --
 Dr. Manuel Rodríguez-Pascual
 skype: manuel.rodriguez.pascual
 phone: (+34) 913466173 // (+34) 679925108

 CIEMAT-Moncloa
 Edificio 22, desp. 1.25
 Avenida Complutense, 40
 28040- MADRID
 SPAIN




 --
 Dr. Manuel Rodríguez-Pascual
 skype: manuel.rodriguez.pascual
 phone: (+34) 913466173 // (+34) 679925108

 CIEMAT-Moncloa
 Edificio 22, desp. 1.25
 Avenida Complutense, 40
 28040- MADRID
 SPAIN



 --
 Morris Moe Jette
 CTO, SchedMD LLC
 Commercial Slurm Development and Support




-- 
Dr. Manuel

[slurm-dev] Re: Messing with job checkpointing

2015-06-02 Thread Manuel Rodríguez Pascual
Hi all,


I have been performing some more tests trying to understand the slurm
internals and to reduce the checkpoint/restart time.

Looking into the job status with slurm_print_job_info, I have observed that
it remains on RUNNING status for about 5 minutes after a
slurm_checkpoint_vacate.

JobId=2133 JobName=variableSizeTester.sh
   UserId=slurm(500) GroupId=slurm(1000)
   Priority=4294901754 Nice=0 Account=(null) QOS=(null)
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:05:39 TimeLimit=UNLIMITED TimeMin=N/A
   SubmitTime=2015-06-02T06:43:16 EligibleTime=2015-06-02T06:43:16
   StartTime=2015-06-02T06:43:17 EndTime=Unknown
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=debug AllocNode:Sid=slurm-master:2951
(...)


So when calling slurm_checkpoint_restart, slurmctld complains with

attempt re-use active job_id 2133
slurm_rpc_checkpoint restart 2133: Duplicate job id


and same error is obtained until the aforementioned 5 minute limit, then
the job record is released, cleaned

slurmctld: debug2: Purging old records
slurmctld: debug2: purge_old_job: purged 1 old job records

and the checkpoint can then be restarted.

I have tried calling purge_old_job() to reduce this time but it does not
work, so I assume that the problem is that the job is considered to be
running and not a missinformation by slurmctld. Also, there is no query
from slurmctld to the compute element, this seems to be some kind of
internal timeout or something like that. Am I right?

My question is then, cannot be this time reduced somehow? Is there any
particular reason why the job is considered as active by Slurmctld for like
5 minutes after its checkpoint and cancellation?

Thanks for your attention.


Best regards,


Manuel








2015-05-29 18:00 GMT+02:00 Manuel Rodríguez Pascual 
manuel.rodriguez.pasc...@gmail.com:

  Hi all,

 I have been messing around a little bit with task checkpoint/restart.

 I am employing BLCR to checkpoint a fairly small application with
 slurm_checkpoint_vacate, what should take several seconds. However, when I
 try to restart it with slurm_checkpoint_restart, the process is very slow.

 Looking at the output of slurmtcld ,  what I get is

 
 slurmctld: debug2: Processing RPC: REQUEST_CHECKPOINT(restart) from uid=0
 slurmctld: attempt re-use active job_id 2110
 slurmctld: _slurm_rpc_checkpoint restart 2110: Duplicate job id
 

 if I continue performing the same call, the output is identical for some
 time, until slurm cleans its internal structures (or something like that),
 writing in the log

 
 slurmctld: debug2: Testing job time limits and checkpoints
 slurmctld: debug2: Performing purge of old job records
 slurmctld: debug2: purge_old_job: purged 1 old job records
 slurmctld: debug:  sched: Running job scheduler
 slurmctld: debug:  backfill: beginning
 slurmctld: debug:  backfill: no jobs to backfill
 

 then, the next call to slurm_checkpoint_restart succeeds,  with

 
 slurmctld: debug2: Processing RPC: REQUEST_CHECKPOINT(restart) from uid=0
 slurmctld: debug2: found 9 usable nodes from config containing
 slurm-compute[1-9]
 slurmctld: debug2: sched: JobId=2110 allocated resources: NodeList=(null)
 slurmctld: _slurm_rpc_checkpoint restart for 2110 usec=909
 slurmctld: debug2: Testing job time limits and checkpoints
 slurmctld: debug:  backfill: beginning
 slurmctld: debug2: backfill: entering _try_sched for job 2110.
 slurmctld: debug2: found 2 usable nodes from config containing
 slurm-compute[1-9]
 slurmctld: backfill: Started JobId=2110 on slurm-compute2
 


 I am wondering why is all this necessary. Why can't the vacate call
 delete everything related to the job, so it can be restarted immediately?
 If there is any particular reason that makes that impossible, why cannot
 the Slurm structures be cleaned (purged or whatever) every 10 seconds or
 so, instead of once every 5-10 minutes? Does it cause a significant
 overhead or scalability issue? Or as an alternative,  is there any API call
 that can be employed to trigger that purge?


 Thanks for your help,


 Manuel






 --
 Dr. Manuel Rodríguez-Pascual
 skype: manuel.rodriguez.pascual
 phone: (+34) 913466173 // (+34) 679925108

 CIEMAT-Moncloa
 Edificio 22, desp. 1.25
 Avenida Complutense, 40
 28040- MADRID
 SPAIN




-- 
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN


[slurm-dev] Messing with job checkpointing

2015-05-29 Thread Manuel Rodríguez Pascual
Hi all,

I have been messing around a little bit with task checkpoint/restart.

I am employing BLCR to checkpoint a fairly small application with
slurm_checkpoint_vacate, what should take several seconds. However, when I
try to restart it with slurm_checkpoint_restart, the process is very slow.

Looking at the output of slurmtcld ,  what I get is


slurmctld: debug2: Processing RPC: REQUEST_CHECKPOINT(restart) from uid=0
slurmctld: attempt re-use active job_id 2110
slurmctld: _slurm_rpc_checkpoint restart 2110: Duplicate job id


if I continue performing the same call, the output is identical for some
time, until slurm cleans its internal structures (or something like that),
writing in the log


slurmctld: debug2: Testing job time limits and checkpoints
slurmctld: debug2: Performing purge of old job records
slurmctld: debug2: purge_old_job: purged 1 old job records
slurmctld: debug:  sched: Running job scheduler
slurmctld: debug:  backfill: beginning
slurmctld: debug:  backfill: no jobs to backfill


then, the next call to slurm_checkpoint_restart succeeds,  with


slurmctld: debug2: Processing RPC: REQUEST_CHECKPOINT(restart) from uid=0
slurmctld: debug2: found 9 usable nodes from config containing
slurm-compute[1-9]
slurmctld: debug2: sched: JobId=2110 allocated resources: NodeList=(null)
slurmctld: _slurm_rpc_checkpoint restart for 2110 usec=909
slurmctld: debug2: Testing job time limits and checkpoints
slurmctld: debug:  backfill: beginning
slurmctld: debug2: backfill: entering _try_sched for job 2110.
slurmctld: debug2: found 2 usable nodes from config containing
slurm-compute[1-9]
slurmctld: backfill: Started JobId=2110 on slurm-compute2



I am wondering why is all this necessary. Why can't the vacate call
delete everything related to the job, so it can be restarted immediately?
If there is any particular reason that makes that impossible, why cannot
the Slurm structures be cleaned (purged or whatever) every 10 seconds or
so, instead of once every 5-10 minutes? Does it cause a significant
overhead or scalability issue? Or as an alternative,  is there any API call
that can be employed to trigger that purge?


Thanks for your help,


Manuel






-- 
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN


[slurm-dev] error code in slurm_checkpoint_complete

2015-03-31 Thread Manuel Rodríguez Pascual
Hi all,

just a quick question regarding Slurm API.

In the following call,

(checkpoint.c)
/*
 * slurm_checkpoint_complete - note the completion of a job step's
checkpoint operation.
 * IN job_id  - job on which to perform operation
 * IN step_id - job step on which to perform operation
 * IN begin_time - time at which checkpoint began
 * IN error_code - error code, highest value for all complete calls is
preserved
 * IN error_msg - error message, preserved for highest error_code
 * RET 0 or a slurm error code
 */
extern int slurm_checkpoint_complete (uint32_t job_id, uint32_t step_id,
time_t begin_time, uint32_t error_code, char *error_msg);


what is error_code employed for? man says

Error code for checkpoint operation. Only the highest value is preserved.

but given that it is an input parameter I don't really see the sense,
moreover given that another error code is returned by the function. Can
anyone enlighten me?

Thanks for your attention. Best regards,


Manuel




-- 
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN


[slurm-dev] Re: restarting checkpoint after slurm_checkpoint_vacate API call

2015-01-30 Thread Manuel Rodríguez Pascual
it works, thanks very much Moe :)


Just for the sake of completion (in case someone ends up googling for
this), this is my complete code for a basic task checkpoint/restart.


int max_wait = 60;
char* checkpoint_location = /home/slurm;

if (slurm_checkpoint_vacate(opt.jobid, opt.stepid,
max_wait, checkpoint_location) != 0){
 slurm_perror(Error checkpointing: );
exit(errno);
}

int i = 0;
while ( slurm_checkpoint_restart(opt.jobid , opt.stepid, 0,
 checkpoint_location) != 0) {
sleep (10);
i = i + 10;
slurm_perror(Error: );
printf (. Still not posible to restart. Time: %i\n, i);

}

printf (job has been restarted\n);



2015-01-29 17:53 GMT+01:00 je...@schedmd.com:


 The slurm_checkpoint_vacate() triggers a checkpoint operation, which could
 take minutes to complete. You can't slurm_checkpoint_restart() the job
 until the checkpoint operation and accounting are complete. Adding some
 sleep/retry logic should do what you want.


 Quoting Manuel Rodríguez Pascual manuel.rodriguez.pasc...@gmail.com:

  Good morning all,

 I am facing a problem when using slurm.h API to manage checkpoints.

 What I want to do is to checkpoint a running task, shut it down, and then
 restore it somewhere (in the same node or another one).

 slurm.conf is configured with:
 CheckpointType=checkpoint/blcr
 JobCheckpointDir=/home/slurm/

 My code, after initial verifications goes like:

  int max_wait = 60;
 if (slurm_checkpoint_vacate(opt.jobid, opt.stepid, max_wait,
 /home/slurm/) != 0)
 _show_error_and_exit();

 //just in case it is still not stopped
 slurm_kill_job(opt.jobid, 9,  KILL_JOB_ARRAY) ;

 char* checkpoint_location = /home/slurm;
 if ( slurm_checkpoint_restart(opt.jobid, opt.stepid, 0,
  checkpoint_location) != 0)
 _show_error_and_exit();
 

 The errno and error message I get is:

 2011: Duplicate job id

 and this content in slurmctld:

 re-use active job_id 2570
 slurmctld: _slurm_rpc_checkpoint restart 2570: Duplicate job id
 

 if I do instead

 if ( slurm_checkpoint_restart(opt.jobid +1 , opt.stepid, 0,
  checkpoint_location) != 0)

 The errno and error message I get is:

 2: No such file or directory

 and this content in slurmctld:

 No job ckpt file (/home/slurm//2571.ckpt) to read
 slurmctld: _slurm_rpc_checkpoint restart 2570: No such file or directory
 
 Which is right, the file does not exist, so of course it cannot start it.
 However if I specify /home/slurm/2570/ as image_dir, the folder created
 by the checkpoint_vacate call, the result is the same.

 Besides that, it seems that the input parameter   image_dir is not read,
 only the default parameter. So if i set my checkpoint_location to
 /a/b/c for example, the output log returns the same error, showing that
 it is trying to find the image in /home/slurm.


 So, this said, have you got any help or suggestion on how to deal with
 checkpoints with Slurm API? Am I doing something wrong? Is there any
 working example I can see? Should I be using other call instead of these
 ones?


 Thanks for your help. Best regards,


 Manuel



 --
 Morris Moe Jette
 CTO, SchedMD LLC
 Commercial Slurm Development and Support




-- 
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN


[slurm-dev] restarting checkpoint after slurm_checkpoint_vacate API call

2015-01-29 Thread Manuel Rodríguez Pascual
Good morning all,

I am facing a problem when using slurm.h API to manage checkpoints.

What I want to do is to checkpoint a running task, shut it down, and then
restore it somewhere (in the same node or another one).

slurm.conf is configured with:
CheckpointType=checkpoint/blcr
JobCheckpointDir=/home/slurm/

My code, after initial verifications goes like:

 int max_wait = 60;
if (slurm_checkpoint_vacate(opt.jobid, opt.stepid, max_wait,
/home/slurm/) != 0)
_show_error_and_exit();

//just in case it is still not stopped
slurm_kill_job(opt.jobid, 9,  KILL_JOB_ARRAY) ;

char* checkpoint_location = /home/slurm;
if ( slurm_checkpoint_restart(opt.jobid, opt.stepid, 0,
 checkpoint_location) != 0)
_show_error_and_exit();


The errno and error message I get is:

2011: Duplicate job id

and this content in slurmctld:

re-use active job_id 2570
slurmctld: _slurm_rpc_checkpoint restart 2570: Duplicate job id


if I do instead

if ( slurm_checkpoint_restart(opt.jobid +1 , opt.stepid, 0,
 checkpoint_location) != 0)

The errno and error message I get is:

2: No such file or directory

and this content in slurmctld:

No job ckpt file (/home/slurm//2571.ckpt) to read
slurmctld: _slurm_rpc_checkpoint restart 2570: No such file or directory

Which is right, the file does not exist, so of course it cannot start it.
However if I specify /home/slurm/2570/ as image_dir, the folder created
by the checkpoint_vacate call, the result is the same.

Besides that, it seems that the input parameter   image_dir is not read,
only the default parameter. So if i set my checkpoint_location to
/a/b/c for example, the output log returns the same error, showing that
it is trying to find the image in /home/slurm.


So, this said, have you got any help or suggestion on how to deal with
checkpoints with Slurm API? Am I doing something wrong? Is there any
working example I can see? Should I be using other call instead of these
ones?


Thanks for your help. Best regards,


Manuel


[slurm-dev] Re: reccomended software stack for development?

2014-10-28 Thread Manuel Rodríguez Pascual
Hi all,

Thanks for your valuable suggestions and different points of view. I will
follow Andy's suggestion and try to keep everything as simple as possible,
slowly integrating the new tools as soon as I feel comfortable with the
existing ones.

Just in case you are interested, I am currently using OpenNebula to manage
a KVM+CentOS private cloud in my workstation, where I run my
Slurm+MPICH+BLCR test cluster. Easy to install and configure, and really
useful to manage virtual images, deploy them and so with a simple web
interface.

Again,  thanks for your support and your warmth welcome.

Best regards,

Manuel

2014-10-27 19:10 GMT+01:00 r...@q-leap.de:


  Manuel == Manuel Rodríguez Pascual 
 manuel.rodriguez.pasc...@gmail.com writes:

 Hi Manuel,

 Manuel Hi all, I have the intention of working on Slurm, modifying
 Manuel it to satisfy my needs and (hopefully) include some new
 Manuel functionalities. I am however kind of newbie with this kind
 Manuel of software development, so I am writing looking for
 Manuel advise. My question is, can you recommend me any tools for
 Manuel the development of slurm?

 I agree with Andy, that it's best to view this as 2 separate tasks (cluster
 setup/management + slurm development).

 For your cluster setup, you could use Qlustar which will allow you to
 easily setup a ready to run virtual demo cluster incl. a functioning
 slurm and OpenMPI in about 30 min (no exaggeration, just follow
 https://www.qlustar.com/book/docs/install-guide
 and https://www.qlustar.com/book/docs/first-steps).
 The Qlustar Basic Edition is free for academic usage and has everything
 needed for your use case.

 Once setup, you have all the tools of Ubuntu or Debian at your
 finger-tips to jump into development.

 Good luck,

 Roland

 ---
 http://www.q-leap.com / http://qlustar.com
   --- HPC / Storage / Cloud Linux Cluster OS ---

 Manuel As a first layer, my idea is to use plain virtual machines
 Manuel and employ Puppet to configure them and then install MPICH
 Manuel and BLCR. Then, Jenkins would install and configure a
 Manuel Slurm-based cluster and run a set of tests.

 Manuel I am however new in using both tools and in developing
 Manuel Slurm, so I am kind of lost right now. then, before starting
 Manuel to build and configure all this, I would really appreciate
 Manuel some suggestions from more experienced developers.

 Manuel I have planned to clone Slurm github repo to work with my
 Manuel own github, and then employ Jenkins for Continuous
 Manuel Integration. I have some doubts on how to exactly do that,
 Manuel in particular regarding the contextualization of the
 Manuel compilation process, and the integration of the included
 Manuel regression tests with Jenkins. Have you got any suggestions
 Manuel on this? Again, any feedback on the best tools to work with
 Manuel Slurm would be welcome.

 Manuel Thanks for your help. Best regards,


 Manuel Manuel




-- 
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN


[slurm-dev] Slurm script not writing to stdout in computing node

2014-10-20 Thread Manuel Rodríguez Pascual
)
ResvOverRun = 0 min
ResvProlog  = (null)
ReturnToService = 1
SallocDefaultCommand= (null)
SchedulerParameters = (null)
SchedulerPort   = 7321
SchedulerRootFilter = 1
SchedulerTimeSlice  = 30 sec
SchedulerType   = sched/backfill
SelectType  = select/linear
SlurmUser   = slurm(500)
SlurmctldDebug  = info
SlurmctldLogFile= (null)
SlurmSchedLogFile   = (null)
SlurmctldPort   = 6817
SlurmctldTimeout= 120 sec
SlurmdDebug = info
SlurmdLogFile   = (null)
SlurmdPidFile   = /var/run/slurmd.pid
SlurmdPlugstack = (null)
SlurmdPort  = 6818
SlurmdSpoolDir  = /var/spool/slurmd
SlurmdTimeout   = 300 sec
SlurmdUser  = root(0)
SlurmSchedLogLevel  = 0
SlurmctldPidFile= /var/run/slurmctld.pid
SlurmctldPlugstack  = (null)
SLURM_CONF  = /usr/local/etc/slurm.conf
SLURM_VERSION   = 14.03.8
SrunEpilog  = (null)
SrunProlog  = (null)
StateSaveLocation   = /var/spool/slurmState
SuspendExcNodes = (null)
SuspendExcParts = (null)
SuspendProgram  = (null)
SuspendRate = 60 nodes/min
SuspendTime = NONE
SuspendTimeout  = 30 sec
SwitchType  = switch/none
TaskEpilog  = (null)
TaskPlugin  = task/none
TaskPluginParam = (null type)
TaskProlog  = (null)
TmpFS   = /tmp
TopologyPlugin  = topology/none
TrackWCKey  = 0
TreeWidth   = 50
UsePam  = 0
UnkillableStepProgram   = (null)
UnkillableStepTimeout   = 60 sec
VSizeFactor = 0 percent
WaitTime= 0 sec
---
---



-- 
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN