Hi Gonzalo,

I am not an expert, so please take the rest of the mail as an opinion and
not completely reliable information.

As far as I know, Slurm simulator was developed by a single guy, and it was
abandoned at some point. Now  it is not supported anymore.  It was
developed for some old branch of Slurm. As Slurm has been deeply modified
from then, it is now not usable.

Marina Zapater has collected some information about it and put into her
github.

https://github.com/marinazapater/slurm-sim

there you can download the simulator, input data and the correct Slurm
version to make it work. It is designed to be executed on Ubuntu
12.something or 14 (TLS), and I also don't know whether it runs on any
other distro. I think this is currently the best starting point to get a
running simulator.

The code is however not perfect. It still presents some scalability issues
(memory leaks? concurrency?) that make it fail when executing large
simulations in terms of nodes or tasks. There is also no documentation at
all, besides a high level description.

I am just starting to get familiar with the simulator, so I cannot give you
any more in-depth information. Also, I would welcome any more information,
current versions or whatever you can find about this, so please feel free
to submit any update to this list (or myself).


Best regards,

2015-08-03 22:26 GMT+02:00 Gonzalo Rodrigo Alvarez <[email protected]
>:

>
>
> Good afternoon,
>
> I have set up the simulator-branch from the  SchedMd fork and I have been
> experiencing some issues with using the simulator. Some I could solve
> myself, but I am having trouble with them, if anybody had similar
> experiences, I think it would be a good thing for all to share. Let's start
> first with the one I have not been able to solve:
>
> When a group  of jobs run longer than their runtime, slurmctld sends the
> corresponding "REQUEST_KILL_TIMELIMIT" rpc, which triggers the creation of
> a number of threads. The first one arrives to slurmd. But It tries to
> create more than the available proto-threads and for some reason this leads
> to slurmctld to block. Any hint on this problem? Maybe I should run it on a
> bigger VM? Now I am using a single core VM,
>
> Now a list of things that I observed I could solve:
> - I observed that unless I would add a "sleep(1)" in some threads derived
> from slurmctld, newly created threads would make
> "_checking_for_new_threads" go on an infinite loop. In particular: agent
> thread, backfil agent, _slurmctld_rpc_mgr loop, _slurmctld_background loop,
> and in general all the agent loops of the plugins used. (I know it is not a
> clean solution, but it worked).
> - In sim_lib: I observed that get_new_thread_id return type was changed
> form int to uint. that broke the code in pthread_create that detects the
> case in which there are no available threads (it returns -1).
> - I had to re-write the way the sleep wrapper and the _time_mgr were
> communicating with thread_sem and thread_sem_back. As it is it kept
> blocking all the time.
>
> Encountering these problems made me wonder if I am working with the
> correct branch (schedmd/simulator). or that the evolution of slurm is
> making the simulator code rot.  When the solutions are more stable and
> clean I will do a patch and poste it here
>
> Also a note to someone who was asking about the test.traces file (I am
> answering here because the google groups interface would not allow me to
> answer to it): you can find it, together with a synthetic user list at the
> original tar file distributed by the BSC, just put it in the sbin dir:
> http://www.bsc.es/marenostrum-support-services/services/slurm-simulator
>
> Thanks in advance!
>
> //Gonzalo
>
>
>


-- 
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN

Reply via email to