Hi Akhil,

The simulator gets the jobs from a trace file "test.trace", reservations
from another file, "rsv.trace" and users from "users.sim". If you want
your own jobs, you need to create another test.trace file. The slurm
used by the simulator does not have any limitation regarding jobs
requirements, so you can submit jobs using all the options sbatch
supports. However,  sim_mgr and the job trace file format have some
limitations so if you need specific job options like min and max for
nodes or cpus, you have to modify the sim_mgr.c program. I would like to
overcome this limitation but I have not time for working on this by
now.  You can take a look to the file attached where another guy did
some changes to a basic program for jobs traces creation.

Documentation is not available except the simulator paper where design
is explained and the docs you have already read about installation. You
better start getting used to slurm commands like sbatch and slurmctld
configuration parameters.



On 07/29/2013 11:02 PM, Akhil langer wrote:
> Can you also please tell if I can simulate moldable jobs i.e. they do
> not have a fixed number of requested nodes rather they specify the
> minimum and maximum number of nodes that the job should run on. If
> yes, where can I specify the min and max # of nodes for the job. Is
> there a simulator documentation that can answer some of these basic
> questions of mine. Please let me know. Thanks!
>
>
> On Mon, Jul 29, 2013 at 9:01 AM, Akhil langer <[email protected]
> <mailto:[email protected]>> wrote:
>
>     Thanks Alejandro for your reply.
>     Yes, this is what I am looking for.
>     Can you please tell where to specify the list of jobs that I want
>     the simulator to schedule?
>
>     Thanks,
>     Akhil
>
>
>     On Mon, Jul 29, 2013 at 3:15 AM, Alejandro Lucero Palau
>     <[email protected] <mailto:[email protected]>> wrote:
>
>         Hi Akhil,
>
>         That's great!
>
>         So the reason behind the simulator was to test how the slurm
>         scheduler reacts to scheduling parameters tuning. You can
>         reproduce a workload using different parameter values like
>         fair sharing percentage or queue priorities. Or maybe you need
>         to know which would be the impact doing a weekly reservation...
>
>         However, I think the simulator is a really useful tool for
>         researchers as well. What you are doing is exactly what the
>         simulator is designed for: it avoids to reproduce a scheduling
>         algorithm based on "simple" parameters. The more complexity
>         you put in a synthetic scheduler the more sense it makes to
>         use a real scheduler. So the simulator allows you to test the
>         current slurm algorithms or to test the one you are working
>         on. There are an important number of details that make an
>         impact on scheduling which is hard to implement from scratch.
>         So why not to use a real scheduler with years of development
>         inside? The simulator does not change how slurm works so it is
>         probably a good tool for testing workloads covering a long
>         period in just some hours. For example, I can take the last 3
>         month workload from our big machine and reproduce it in the
>         simulator in just a couple of hours.
>
>         I hope you find it useful.
>
>         Best regards
>
>
>         On 07/27/2013 11:23 PM, Akhil langer wrote:
>>         Hi Alejandro,
>>
>>         I did the installation from scratch again and it is now
>>         working. Thanks for your help!
>>         Can you please give the answers to my other questions in my
>>         last reply. Thanks again!
>>
>>         Akhil
>>
>>
>>         On Fri, Jul 26, 2013 at 1:15 PM, Akhil langer
>>         <[email protected] <mailto:[email protected]>> wrote:
>>
>>             Hi Alajendro,
>>
>>             I want to measure slurm's scheduling throughput. and/or
>>             given a set of jobs, how does slurm schedule them. We are
>>             writing a simple scheduler (that does not use/require
>>             slurm) for our specific problem and want to see how it
>>             compares with slurm's scheduling policy. So the use case
>>             of the simulator is very simple - given a set of jobs we
>>             want to know how (in which order) slurm would execute
>>             them. Do you think doing these simulations will be
>>             difficult with the simulator? Can you please tell any
>>             information if there is anything that will ease these
>>             experiments of mine, as you might have done this before.
>>
>>             I did reset.sh and I am getting some other problem now.
>>             slurmctld is not starting when exec_sim.pl
>>             <http://exec_sim.pl> is called. It is again having some
>>             issues in changing owner/permissions of a file. log files
>>             are attached
>>
>>
>>             On Fri, Jul 26, 2013 at 12:32 PM, Alejandro Lucero Palau
>>             <[email protected]
>>             <mailto:[email protected]>> wrote:
>>
>>                 Hi Akhil,
>>
>>                 It is working fine. It seems slurmctld has a job with
>>                 that jobid from previous executions.
>>
>>                 Just execute reset.pl <http://reset.pl>  script
>>                 before exec_sim.pl <http://exec_sim.pl>
>>
>>                 All of this just gives you an easy way to test the
>>                 simulator but you will need to work a bit harder for
>>                 getting something useful from it.
>>
>>                 By the way, what do you have in mind about using the
>>                 simulator?
>>
>>                 I have not had time lately to work on it but I will
>>                 as soon as I get a chance.
>>
>>                 Regards
>>
>>
>>                 On 07/26/2013 11:41 AM, Akhil langer wrote:
>>>                 Alejandro,
>>>                 Please find attached the log files.
>>>
>>>                 Thanks,
>>>                 Akhil
>>>
>>>
>>>                 On Fri, Jul 26, 2013 at 1:28 AM, Alejandro Lucero
>>>                 Palau <[email protected]
>>>                 <mailto:[email protected]>> wrote:
>>>
>>>                     Hi Akhil,
>>>
>>>                     This should not happen if you have followed
>>>                     instructions about user installation.
>>>
>>>                     Please, send me the log files including sim_mgr.log
>>>
>>>
>>>
>>>                     On 07/25/2013 07:33 PM, Akhil langer wrote:
>>>>                     Thanks Alejandro,
>>>>
>>>>                     That solved the problem. Now all the daemons
>>>>                     start. However, exec_pl gives this error for
>>>>                     every job:
>>>>                     sbatch: error: Batch job submission failed: I/O
>>>>                     error writing script/environment to file.
>>>>
>>>>                     Can you please tell which file it is trying to
>>>>                     write, I can change its permissions.
>>>>
>>>>                     Thanks,
>>>>                     Akhil
>>>>
>>>>
>>>>                     On Thu, Jul 25, 2013 at 5:37 AM, Alejandro
>>>>                     Lucero Palau <[email protected]
>>>>                     <mailto:[email protected]>> wrote:
>>>>
>>>>                         Hi Akhil,
>>>>
>>>>                         It seems the slurmctld can not contact with
>>>>                         the slurmdbd.
>>>>
>>>>                         exec_sim.pl <http://exec_sim.pl> starts the
>>>>                         controller and slurmd but it trust in
>>>>                         having slurmdbd working.
>>>>
>>>>                         If you have more problems once you start
>>>>                         the slurmdbd daemon, I will need the
>>>>                         sim_mgr.log file as well.
>>>>
>>>>                         Regards
>>>>
>>>>                         On 07/24/2013 07:42 PM, Akhil langer wrote:
>>>>>                         Alejandro,
>>>>>
>>>>>                         I have attached all the logs.
>>>>>                         I am using Ubuntu 12.04.02 instead of
>>>>>                         12.04.01. Also, I am using Virtualbox and
>>>>>                         not VmWare.
>>>>>                         Thanks for the help!
>>>>>
>>>>>
>>>>>                         On Wed, Jul 24, 2013 at 10:31 AM,
>>>>>                         Alejandro Lucero Palau
>>>>>                         <[email protected]
>>>>>                         <mailto:[email protected]>> wrote:
>>>>>
>>>>>                             Hi,
>>>>>
>>>>>                             That error should not be the problem.
>>>>>
>>>>>                             Can you send me the full log files?
>>>>>
>>>>>                             Are you using same distribution and VM
>>>>>                             as commented in installation
>>>>>                             instruction file?
>>>>>
>>>>>
>>>>>
>>>>>                             On 07/23/2013 09:25 PM, Akhil langer
>>>>>                             wrote:
>>>>>>                             I followed all the instructions of
>>>>>>                             getting started with the slurm
>>>>>>                             simulator on a new Ubuntu VM.
>>>>>>                             Everytihng seems fine. But when I run
>>>>>>                             ./exec_sim.pl
>>>>>>                             <http://exec_sim.pl/> SIM_DIR 100, I
>>>>>>                             get the following errors in the
>>>>>>                             slurmctld.log file:
>>>>>>                             slurmctld: error: unable to open
>>>>>>                             pidfile /var/run/slurmctld.pid:
>>>>>>                             Permission denied
>>>>>>
>>>>>>                             /var/run/* files have read
>>>>>>                             permissions to users, I am not sure
>>>>>>                             why is slurm trying to open it in
>>>>>>                             write mode.
>>>>>>                             This error goes away if I do sudo
>>>>>>                             ./exec_sim.pl
>>>>>>                             <http://exec_sim.pl/> but then other
>>>>>>                             errors come up such as SlurmUser is
>>>>>>                             not set to root, etc..
>>>>>>
>>>>>>                             How to fix this?
>>>>>>
>>>>>>
>>>>>>                             On Tue, Jul 23, 2013 at 2:10 PM,
>>>>>>                             Akhil langer <[email protected]
>>>>>>                             <mailto:[email protected]>> wrote:
>>>>>>
>>>>>>                                 I followed all the instructions
>>>>>>                                 of getting started with the slurm
>>>>>>                                 simulator on a new Ubuntu VM.
>>>>>>                                 Everytihng seems fine. But when I
>>>>>>                                 run ./exec_sim.pl
>>>>>>                                 <http://exec_sim.pl> SIM_DIR 100,
>>>>>>                                 I get the following errors in the
>>>>>>                                 slurmctld.log file:
>>>>>>                                 slurmctld: error: unable to open
>>>>>>                                 pidfile /var/run/slurmctld.pid:
>>>>>>                                 Permission denied
>>>>>>
>>>>>>                                 /var/run/* files have read
>>>>>>                                 permissions to users, I am not
>>>>>>                                 sure why is slurm trying to open
>>>>>>                                 it in write mode.
>>>>>>                                 This error goes away if I do sudo
>>>>>>                                 ./exec_sim.pl
>>>>>>                                 <http://exec_sim.pl> but then
>>>>>>                                 other errors come up such as
>>>>>>                                 SlurmUser is not set to root, etc..
>>>>>>
>>>>>>                                 How to fix this?
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                             WARNING / LEGAL TEXT: This message is
>>>>>                             intended only for the use of the
>>>>>                             individual or entity to which it is
>>>>>                             addressed and may contain information
>>>>>                             which is privileged, confidential,
>>>>>                             proprietary, or exempt from disclosure
>>>>>                             under applicable law. If you are not
>>>>>                             the intended recipient or the person
>>>>>                             responsible for delivering the message
>>>>>                             to the intended recipient, you are
>>>>>                             strictly prohibited from disclosing,
>>>>>                             distributing, copying, or in any way
>>>>>                             using this message. If you have
>>>>>                             received this communication in error,
>>>>>                             please notify the sender and destroy
>>>>>                             and delete any copies you may have
>>>>>                             received.
>>>>>
>>>>>                             http://www.bsc.es/disclaimer
>>>>>                             <http://www.bsc.es/disclaimer.htm>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>                         WARNING / LEGAL TEXT: This message is
>>>>                         intended only for the use of the individual
>>>>                         or entity to which it is addressed and may
>>>>                         contain information which is privileged,
>>>>                         confidential, proprietary, or exempt from
>>>>                         disclosure under applicable law. If you are
>>>>                         not the intended recipient or the person
>>>>                         responsible for delivering the message to
>>>>                         the intended recipient, you are strictly
>>>>                         prohibited from disclosing, distributing,
>>>>                         copying, or in any way using this message.
>>>>                         If you have received this communication in
>>>>                         error, please notify the sender and destroy
>>>>                         and delete any copies you may have received.
>>>>
>>>>                         http://www.bsc.es/disclaimer
>>>>                         <http://www.bsc.es/disclaimer.htm>
>>>>
>>>>
>>>
>>>
>>>
>>>                     WARNING / LEGAL TEXT: This message is intended
>>>                     only for the use of the individual or entity to
>>>                     which it is addressed and may contain
>>>                     information which is privileged, confidential,
>>>                     proprietary, or exempt from disclosure under
>>>                     applicable law. If you are not the intended
>>>                     recipient or the person responsible for
>>>                     delivering the message to the intended
>>>                     recipient, you are strictly prohibited from
>>>                     disclosing, distributing, copying, or in any way
>>>                     using this message. If you have received this
>>>                     communication in error, please notify the sender
>>>                     and destroy and delete any copies you may have
>>>                     received.
>>>
>>>                     http://www.bsc.es/disclaimer
>>>                     <http://www.bsc.es/disclaimer.htm>
>>>
>>>
>>
>>
>>
>>                 WARNING / LEGAL TEXT: This message is intended only
>>                 for the use of the individual or entity to which it
>>                 is addressed and may contain information which is
>>                 privileged, confidential, proprietary, or exempt from
>>                 disclosure under applicable law. If you are not the
>>                 intended recipient or the person responsible for
>>                 delivering the message to the intended recipient, you
>>                 are strictly prohibited from disclosing,
>>                 distributing, copying, or in any way using this
>>                 message. If you have received this communication in
>>                 error, please notify the sender and destroy and
>>                 delete any copies you may have received.
>>
>>                 http://www.bsc.es/disclaimer
>>                 <http://www.bsc.es/disclaimer.htm>
>>
>>
>>
>
>
>
>         WARNING / LEGAL TEXT: This message is intended only for the
>         use of the individual or entity to which it is addressed and
>         may contain information which is privileged, confidential,
>         proprietary, or exempt from disclosure under applicable law.
>         If you are not the intended recipient or the person
>         responsible for delivering the message to the intended
>         recipient, you are strictly prohibited from disclosing,
>         distributing, copying, or in any way using this message. If
>         you have received this communication in error, please notify
>         the sender and destroy and delete any copies you may have
>         received.
>
>         http://www.bsc.es/disclaimer <http://www.bsc.es/disclaimer.htm>
>
>
>



WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer
// Author : Alejandro Lucero Palau <[email protected]>
// Some additions by anetto <[email protected]>

/*
 * qos.sim and log.csv must exists, and there may be test.trace fuke
 *
 * qos.sim example:
low
normal
high
 * log.csv must contain tasks, one per line, each column separated by colon (";"), spaces are being skipped
 * columns are: "loginname; submit_time; duration; wc_limit; tasks"
 * log.csv example
user330;1532;1500;3600;2
user642;1612;446;3600;2
 * submit time (column 2) must grow grow up (closer to bottom - more value must be)
 * duration (col. 3) must be <= wc_limit(col. 4)
 * all users (col. 1) should be added like
 ./slurm_programs/bin/sacctmgr -i add user user330 accounts=bsc
./slurm_programs/bin/sacctmgr -i add user user642 accounts=bsc
 * account "bsc" is the same as you write in command line to execute trace_builder
 */

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <getopt.h>
#include <string.h>

#include "sim_trace.h"

/* this program creates a file with a set of job traces for slurm simulator */

static struct option long_options[] = {
    {"jobs",  1, 0, 'j'},
    {"partition",  1, 0, 'p'},
    {"account",     1, 0, 'a'},
    {"cpus_per_task",     1, 0, 'm'},
    {"tasks_per_node",     1, 0, 't'},
    {"submit_time",     1, 0, 's'},
    {NULL,       0, 0, 0}
};

int trace_file;
int job_counter = 1001;
int total_jobs = 0;
int submit_time = 0;
int cpus_per_task = 0;
int tasks_per_node = 0;

job_trace_t new_trace;

char *username[1000];
int total_users = 0;

char *qosname[1000];
int total_qos = 0;

char *default_partition;
char *default_account;

/* reading line from logfile
 * and filling global structure new_trace
 *
 * input - handler to a file
 * output - 0 if all correct, -1 if smth is bad
 * other - modifying global variable "new_trace"
 */
int read_from_log(int logfile)
{
	char *line_read;
  	int char_pos = 0;
	int parsed_word = 0;
	line_read = malloc(256);
	if (line_read == NULL)
		return -1;
	
	//reading from log file until eof or \n or more ';' in line than expected
	while( read(logfile, &line_read[char_pos], 1) ){
		//printf("Reading char: %c\n", line_read[char_pos]);
		//skipping spaces
		if(line_read[char_pos] == ' ')
			continue;
		//eol or more lines in log than needed
		if(parsed_word > 4){
			break;
		}
		if(line_read[char_pos] == ';' || line_read[char_pos] == '\n'){
			line_read[char_pos] = '\0';
			//printf("char_pos = %i, parsed_word = %i\n", char_pos, parsed_word);
			switch (parsed_word) {
				case 0:
					snprintf(new_trace.username, char_pos+1, "%s", line_read);
					//printf("line_read = %s \n", line_read);
					break;
				case 1:
					new_trace.submit = atoi(line_read) + submit_time;
					//printf("line_read = %s \n", line_read);
					break;
				case 2:
					new_trace.duration = atoi(line_read);
					//printf("line_read = %s \n", line_read);
					break;
				case 3:
					new_trace.wclimit = atoi(line_read);
					//printf("line_read = %s \n", line_read);
					break;
				case 4:
					new_trace.tasks = atoi(line_read);
					printf("line_read = %s \n", line_read);
					return 0;
				default:
					printf("More params in log line than expected");
					return -1;
			}
			parsed_word++;
			char_pos = 0;
		}
		else
			char_pos++;
	}
	free(line_read);	
	return 0;
}

int main(int argc, char *argv[]){

		int i;
		int written;
		int userfile;
		int qosfile;
		int logfile;
		int char_pos;
		int endfile = 0;
		char *name;
        int option_index;
        int opt_char;

        while((opt_char = getopt_long(argc, argv, "jpamts",
                        long_options, &option_index)) != -1) {
            switch (opt_char) {
                case (int)'j':
                    total_jobs = atoi(optarg);
                    break;

                case (int)'p':
                    default_partition = strdup(optarg);
                    break;

                case (int)'a':
                    default_account = strdup(optarg);
                    break;

                case (int)'m':
                    cpus_per_task = atoi(optarg);
                    break;

                case (int)'t':
                    tasks_per_node = atoi(optarg);
                    break;

                case (int)'s':
                    submit_time = atoi(optarg);
                    break;

                default:
                    fprintf(stderr, "getopt error, returned %c\n",
                            opt_char);
                    exit(0);
            }
        }

        if(	total_jobs == 0 ||
		default_partition == NULL ||
		default_account == NULL ||
		cpus_per_task == 0 || 
		tasks_per_node == 0 ||
		submit_time == 0
		){
            printf("Usage: %s --jobs=xx --partition=xxxx --account=xxxx --cpus_per_task=xx --tasks_per_node=xx --submit_time=xx(unixtime)\n", argv[0]);
			printf("Ex: %s --jobs=10000 --partition=projects --account=bsc --cpus_per_task=1 --tasks_per_node=1 --submit_time=1300000000\n", argv[0]);
            return -1;
        }
		// parsing file qos.sim
		qosfile = open("qos.sim", O_RDONLY);

		if(qosfile < 0){
			printf("qos.sim file not found\n");
			return -1;
		}

		// parsing qosfile
		endfile = 0;
		while(1){
		  	char_pos = 0;
			name = malloc(30);
			while(1){
				if(read(qosfile, &name[char_pos], 1) <= 0){
					endfile = 1;
					break;
				};
				//printf("Reading char: %c\n", username[char_pos][total_users]);
				if(name[char_pos] == '\n'){
					name[char_pos] = '\0';
					break;
				}
				char_pos++;
			}
			if(endfile)
				break;
			qosname[total_qos] = name;
			printf("Reading qos: %s\n", qosname[total_qos]);
			total_qos++;
		}

		// opening test.trace
		if((trace_file = open("test.trace", O_CREAT | O_RDWR, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH)) < 0){
				printf("Error opening file test.trace\n");
				return -1;
		}
		
		// opening logfile
		if ( (logfile = open("log.csv", O_RDONLY)) < 0){
			printf("Error opening file log.csv\n");
			return -1;
		}

		// reading total_jobs lines from logfile
		// if total_jobs > lines in logfile, last logfile line will be dublicated more and more to fill total_jobs jobs
		for(i = 0; i < total_jobs; i++){

				if (read_from_log(logfile) < 0){
					printf("Error reading log.csv on line %i", i);
					return -1;
				}
				int j;

				new_trace.job_id = job_counter++;

				sprintf(new_trace.qosname, qosname[i % total_qos]);
			
				sprintf(new_trace.account, "%s", default_account);
				
				new_trace.cpus_per_task = cpus_per_task;
				
				sprintf(new_trace.partition, "%s", default_partition);
				
				new_trace.tasks_per_node = tasks_per_node;

				written = write(trace_file, &new_trace, sizeof(new_trace));

				printf("JOB(%s): %d, %d, %d %d\n", new_trace.username, job_counter - 1, new_trace.duration, new_trace.tasks, new_trace.cpus_per_task);
				if(written != sizeof(new_trace)){
						printf("Error writing to file: %d of %ld\n", written, sizeof(new_trace));
						return -1;
				}

		}

		return 0;
}

Reply via email to