Related to this, is it possible to know how many CPU and Memory is currently using the job?
2012/1/24 Mark Nelson <mdnels...@gmail.com> > Hi Moe, > > That was exactly what we were after. (hmmm, I really should read through > the man-pages more carefully...) > > I'll pass on the idea of adding an XML output option. > > Many thanks! > Mark. > > > On 25/01/12 03:52, Moe Jette wrote: > >> Use the command "scontrol show job --detail". The output will contain a >> line like this for each node allocated to each job: >> Nodes=tux123 CPU_IDs=2-5 Mem=2048 >> While the data does exist, that's not going to be particularly simple to >> parse and work with. There has been talk about adding an "--xml" option >> for XML output from scontrol, but that has never been done. Since SLURM >> is open source, you could modify scontrol to add an "--xml" option or >> build a new tool for your particular application. >> >> Moe Jette >> SchedMD >> >> Quoting Mark Nelson <mdnels...@gmail.com>: >> >> Hi there, >>> >>> My colleague came up with the question below about running jobs on a >>> normal x86 based cluster. Hopefully someone here can shed some light >>> on this. >>> >>> When running SLURM on a multi-core/multi-socket cluster systems is >>> there any way of finding out the cores allocated for a particular job. >>> Using "scontrol show job" I can find out which nodes are allocated and >>> a total number of cores, but have no way of knowing how these cores >>> might be distributed across the nodes. While the system seems to >>> allocate cores consecutively, across multiple jobs there is no way of >>> knowing which cores are assigned to which job. For example, in an >>> 8-core multi-node system, if I ask for 3 cores across 2 nodes (salloc >>> -n 3 -N 2) how do I know if 2 cores are allocated from the first node >>> and 1 core from the second or visa-versa. Also as nodes are filled up >>> with other jobs, and jobs finish at different times, there is no way >>> of mapping jobs to particular cores. I've seen from other postings >>> that SLURM core numbering might not match the physical hardware core >>> numbering, but for my purposes this is not a problem, as long as the >>> numbering is consistent. >>> >>> The reason I'm asking this question, is I'm trying to integrate SLURM >>> with PTP (Eclipse Parallel Tools Platform) system monitoring that >>> expects to map jobs to nodes and cores in a graphical interface. >>> Therefore for jobs on a multi-core cluster, I need to report on which >>> cores and nodes a particular job is running, in a specified XML format. >>> >>> >>> Many thanks! >>> Mark. >>> >>> >> >> >> >