Thank you so much for the detailed replies, Dr. Lucero. I greatly
appreciate your help.
Looking forward to hearing from you regarding the documentation and
the example.
Thanks again.
Sincerely,
T
On Wed, Feb 13, 2013 at 3:31 AM, Alejandro Lucero Palau
<[email protected] <mailto:[email protected]>> wrote:
Hi,
Coming back after two days off.
On 02/08/2013 11:02 PM, Tapasya Patki wrote:
Hello Dr. Lucero,
Thank you for being so prompt with your replies. I greatly
appreciate your help. It would be nice to have some documentation
on the trace file format and a very simple example of how we can
use a new scheduling algorithm with the slurm simulator. I am
looking forward to an update from you on these things.
I'm working on this and I let you know when it is ready.
The other open source job scheduling simulator I came across was
GridSim + Alea-3. I'm not sure how this compares to the SLURM
simulator, and not sure how it differs (Grids vs clusters?) from
the SLURM simulator. Do you have any comments on this?
I do not know that GridSim. I read some papers using some kind of
simulation but they were really poor aproximations like taking
jobs execution time divided by 1000 or so and then run the
software using those jobs. Moab simulation mode did (or does)
probably a better job but I do not know how is implemented. The
slurm simulator had the goal of using the core slurm code with
minimal changes. It implies having a complex mechanism for
handling threads but by other hand it is trivial to upgrade the
simulator for new versions. If this work goes to main Slurm tree,
the idea is to have pieces of code specific for simulation but
avoiding main core developers to be aware of this functionality.
I also had another general question, and I'm assuming there are
lot of people with expertise in this area on this listserv. So I
thought I'd ask them here.
Do we have any statistical on how many supercomputing centers use
SLURM? I know most of them use MOAB with SLURM, but MOAB is not
open source (correct me if I'm wrong here). I'm not sure how MOAB
and SLURM interact, so any insight into that will be useful too.
I do not know which percentage of TOP100 are using Slurm but the
TOP1 is using it and I guess all the machines using BGQ.
Moab is a scheduler with a good number of features and
configuration options. It needs a resource manager for launching
jobs and controlling nodes and jobs. Slurm does this when working
with Moab. There's a plugin for slurm wiki/wiki2 under the
plugins/sched directory that defines how Moab and slurm cooperate.
Thanks.
Sincerely,
T
On Fri, Feb 8, 2013 at 11:12 AM, Alejandro Lucero Palau
<[email protected] <mailto:[email protected]>> wrote:
Hi, Tapasya
I'm glad to see you need more info about it. Until now this
trace format has been really specific for me. I have not work
on improve it or document it since I did the simulation core
work. I'll work on this as soon as possible.
Adding a new plug in should be as easier as with normal
slurm. You should be aware of simulator basic behaviour for
avoiding problems under simulation. As you say, it would be
really useful an example about it.
It seems I have work
On 02/08/2013 05:21 PM, Tapasya Patki wrote:
Thank you so much for the prompt reply, Dr. Lucero.
After putting in some effort, I could build the slurm
simulator on my machine. I had a few questions, though, and
there's not enough documentation on how to use the slurm
simulator yet (I'm willing to write and share some of my
build/run experiences once I have a stable enough work
environment).
1. Can you provide an accurate description of the following
inputs in trace_builder?
--tasks-per-node
--cpus
--cpus-per-task
--submit-time
2. How do I plug in and test a new scheduling policy with
the simulator? Is there a dummy hello world example for this?
3. To simulate a job mix on N nodes, do I need to run the
simulator on N physical nodes? This is unclear because I saw
a couple of "more processors requested than available" sort
of errors with your trace file. Also, in the trace
representation, what do the "x (y, z)" numbers indicate in
the tasks column? And what does WCLimit stand for?
Thank you so much for your help. Also, having an open source
database with real trace files and slurm conf files will be
very useful.
Sincerely,
Tapasya Patki
Department of Computer Science
University of Arizona
On Fri, Feb 8, 2013 at 9:09 AM, Alejandro Lucero Palau
<[email protected] <mailto:[email protected]>>
wrote:
The last two weeks have been very productive debugging
the simulator workbench (Thanks Maciej!!!)
There's a new sim_test_dir workbench with some patches
and modifications:
http://www.bsc.es/marenostrum-support-services/services/slurm-simulator
Also, instructions for using a Ubuntu under a vitrtual
machine for installing the simulator should make the
process easier.
There's a port for using the simulator with Slurm 2.5
that will be available next week.
As there are several people trying to use the simulator
for validating research, I wonder if it is time to
create a database with trace files along with slurm
configuration files taken from real production machines.
I know this data is treated as a treasure by some
centers but in my opinion, it could be more useful for
researchers. Come on, this is open source world!!!
On 02/08/2013 07:54 AM, Tapasya Patki wrote:
Hello,
I am trying to build the slurm simulator and am encountering several
problems
(http://www.bsc.es/marenostrum-support-services/services/slurm-simulator). I
wanted to check if a newer version was available, or if better
documentation was available, and if someone is actively working on
the
simulator's development at the moment. Previously, the author
(Alejandro
Lucero) had mentioned some interest in creating a Virtual Machine
environment with the simulator pre-installed-- is there any update
on this?
Alternatively, is there any other open source simulator similar to
the
slurm simulator available?
Thank you for your help.
Sincerely,
Tapasya Patki
Department of Computer Science
University of Arizona
WARNING / LEGAL TEXT: This message is intended only for
the use of the individual or entity to which it is
addressed and may contain information which is
privileged, confidential, proprietary, or exempt from
disclosure under applicable law. If you are not the
intended recipient or the person responsible for
delivering the message to the intended recipient, you
are strictly prohibited from disclosing, distributing,
copying, or in any way using this message. If you have
received this communication in error, please notify the
sender and destroy and delete any copies you may have
received.
http://www.bsc.es/disclaimer
<http://www.bsc.es/disclaimer.htm>
WARNING / LEGAL TEXT: This message is intended only for the
use of the individual or entity to which it is addressed and
may contain information which is privileged, confidential,
proprietary, or exempt from disclosure under applicable law.
If you are not the intended recipient or the person
responsible for delivering the message to the intended
recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If
you have received this communication in error, please notify
the sender and destroy and delete any copies you may have
received.
http://www.bsc.es/disclaimer <http://www.bsc.es/disclaimer.htm>
WARNING / LEGAL TEXT: This message is intended only for the use of
the individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or
exempt from disclosure under applicable law. If you are not the
intended recipient or the person responsible for delivering the
message to the intended recipient, you are strictly prohibited
from disclosing, distributing, copying, or in any way using this
message. If you have received this communication in error, please
notify the sender and destroy and delete any copies you may have
received.
http://www.bsc.es/disclaimer <http://www.bsc.es/disclaimer.htm>