Just to add to this thread, I am a huge proponent of sharing trace files, here is a link to an archive/repository and formats for workload traces.

http://www.cs.huji.ac.il/labs/parallel/workload/

It is not very "active", but if we can get more folks interested in sharing workloads, perhaps we can change that.

-Jerry

Tapasya Patki wrote:
Thank you so much for the detailed replies, Dr. Lucero. I greatly appreciate your help. Looking forward to hearing from you regarding the documentation and the example.

Thanks again.

Sincerely,
T

On Wed, Feb 13, 2013 at 3:31 AM, Alejandro Lucero Palau <[email protected] <mailto:[email protected]>> wrote:

    Hi,

    Coming back after two days off.



    On 02/08/2013 11:02 PM, Tapasya Patki wrote:
    Hello Dr. Lucero,

    Thank you for being so prompt with your replies. I greatly
    appreciate your help. It would be nice to have some documentation
    on the trace file format and a very simple example of how we can
    use a new scheduling algorithm with the slurm simulator. I am
    looking forward to an update from you on these things.

    I'm working on this and I let you know when it is ready.


    The other open source job scheduling simulator I came across was
    GridSim + Alea-3. I'm not sure how this compares to the SLURM
    simulator, and not sure how it differs (Grids vs clusters?) from
    the SLURM simulator. Do you have any comments on this?


    I do not know that GridSim. I read some papers using some kind of
    simulation but they were really poor aproximations like taking
    jobs execution time divided by 1000 or so and then run the
    software using those jobs. Moab simulation mode did (or does)
    probably a better job but I do not know how is implemented. The
    slurm simulator had the goal of using the core slurm code with
    minimal changes. It implies having a complex mechanism for
    handling threads but by other hand it is trivial to upgrade the
    simulator for new versions. If this work goes to main Slurm tree,
    the idea is to have pieces of code specific for simulation but
    avoiding main core developers to be aware of this functionality.

    I also had another general question, and I'm assuming there are
    lot of people with expertise in this area on this listserv. So I
    thought I'd ask them here.

    Do we have any statistical on how many supercomputing centers use
    SLURM? I know most of them use MOAB with SLURM, but MOAB is not
    open source (correct me if I'm wrong here). I'm not sure how MOAB
    and SLURM interact, so any insight into that will be useful too.



    I do not know which percentage of TOP100 are using Slurm but the
    TOP1 is using it and I guess all the machines using BGQ.

    Moab is a scheduler with a good number of features and
    configuration options. It needs a resource manager for launching
    jobs and controlling nodes and jobs. Slurm does this when working
    with Moab. There's a plugin for slurm wiki/wiki2 under the
    plugins/sched directory that defines how Moab and slurm cooperate.


    Thanks.

    Sincerely,
    T

    On Fri, Feb 8, 2013 at 11:12 AM, Alejandro Lucero Palau
    <[email protected] <mailto:[email protected]>> wrote:

        Hi, Tapasya

        I'm glad to see you need more info about it. Until now this
        trace format has been really specific for me. I have not work
        on improve it or document it since I did the simulation core
        work. I'll work on this as soon as possible.

        Adding a new plug in should be as easier as with normal
        slurm. You should be aware of simulator basic behaviour for
        avoiding problems under simulation. As you say, it would be
        really useful an example about it.

        It seems I have work


        On 02/08/2013 05:21 PM, Tapasya Patki wrote:
Thank you so much for the prompt reply, Dr. Lucero.
        After putting in some effort, I could build the slurm
        simulator on my machine. I had a few questions, though, and
        there's not enough documentation on how to use the slurm
        simulator yet (I'm willing to write and share some of my
        build/run experiences once I have a stable enough work
        environment).

        1. Can you provide an accurate description of the following
        inputs in trace_builder?

        --tasks-per-node
        --cpus
        --cpus-per-task
        --submit-time

        2. How do I plug in and test a new scheduling policy with
        the simulator? Is there a dummy hello world example for this?

        3. To simulate a job mix on N nodes, do I need to run the
        simulator on N physical nodes? This is unclear because I saw
        a couple of "more processors requested than available" sort
        of errors with your trace file. Also, in the trace
        representation, what do the "x (y, z)" numbers indicate in
        the tasks column? And what does WCLimit stand for?

        Thank you so much for your help. Also, having an open source
        database with real trace files and slurm conf files will be
        very useful.

        Sincerely,

        Tapasya Patki
        Department of Computer Science
        University of Arizona

        On Fri, Feb 8, 2013 at 9:09 AM, Alejandro Lucero Palau
        <[email protected] <mailto:[email protected]>>
        wrote:

            The last two weeks have been very productive debugging
            the simulator workbench (Thanks Maciej!!!)

            There's a new sim_test_dir workbench with some patches
            and modifications:

            
http://www.bsc.es/marenostrum-support-services/services/slurm-simulator

             Also, instructions for using a Ubuntu under a vitrtual
            machine for installing the simulator should make the
            process easier.

            There's a port for using the simulator with Slurm 2.5
            that will be available next week.

            As there are several people trying to use the simulator
            for validating research, I wonder if it is time to
            create a database with trace files along with slurm
            configuration files taken from real production machines.
            I know this data is treated as a treasure by some
            centers but in my opinion, it could be more useful for
            researchers. Come on, this is open source world!!!




            On 02/08/2013 07:54 AM, Tapasya Patki wrote:
            Hello,

            I am trying to build the slurm simulator and am encountering several
            problems 
(http://www.bsc.es/marenostrum-support-services/services/slurm-simulator). I
            wanted to check if a newer version was available, or if better
            documentation was available, and if someone is actively working on 
the
            simulator's development at the moment. Previously, the author 
(Alejandro
            Lucero) had mentioned some interest in creating a Virtual Machine
            environment with the simulator pre-installed-- is there any update 
on this?

            Alternatively, is there any other open source simulator similar to 
the
            slurm simulator available?

            Thank you for your help.

            Sincerely,

            Tapasya Patki
            Department of Computer Science
            University of Arizona



            WARNING / LEGAL TEXT: This message is intended only for
            the use of the individual or entity to which it is
            addressed and may contain information which is
            privileged, confidential, proprietary, or exempt from
            disclosure under applicable law. If you are not the
            intended recipient or the person responsible for
            delivering the message to the intended recipient, you
            are strictly prohibited from disclosing, distributing,
            copying, or in any way using this message. If you have
            received this communication in error, please notify the
            sender and destroy and delete any copies you may have
            received.

            http://www.bsc.es/disclaimer
            <http://www.bsc.es/disclaimer.htm>





        WARNING / LEGAL TEXT: This message is intended only for the
        use of the individual or entity to which it is addressed and
        may contain information which is privileged, confidential,
        proprietary, or exempt from disclosure under applicable law.
        If you are not the intended recipient or the person
        responsible for delivering the message to the intended
        recipient, you are strictly prohibited from disclosing,
        distributing, copying, or in any way using this message. If
        you have received this communication in error, please notify
        the sender and destroy and delete any copies you may have
        received.

        http://www.bsc.es/disclaimer <http://www.bsc.es/disclaimer.htm>





    WARNING / LEGAL TEXT: This message is intended only for the use of
    the individual or entity to which it is addressed and may contain
    information which is privileged, confidential, proprietary, or
    exempt from disclosure under applicable law. If you are not the
    intended recipient or the person responsible for delivering the
    message to the intended recipient, you are strictly prohibited
    from disclosing, distributing, copying, or in any way using this
    message. If you have received this communication in error, please
    notify the sender and destroy and delete any copies you may have
    received.

    http://www.bsc.es/disclaimer <http://www.bsc.es/disclaimer.htm>


Reply via email to