Re: [slurm-dev] Squeue taking 2+ minutes with 30K jobs

Moe Jette Mon, 07 Nov 2011 13:46:57 -0800

Your patch is good. That's quite an improvement!


Quoting Clay Teeter <[email protected]>:

Here is my patch.  Does this look ok, for now anyway?

And, 20K jobs are now beings pulled at around 1s !!!!!!

Thanks,
Clay

+++ /home/cteeter/sort.c 2011-11-07 13:26:56.000000000 -0800
@@ -89,6 +89,8 @@
 static int _sort_step_by_user_id(void *void1, void *void2);
 static int _sort_step_by_user_name(void *void1, void *void2);

+static time_t now;
+

/*****************************************************************************

  * Global Print Functions

*****************************************************************************/
@@ -96,6 +98,7 @@
 void sort_job_list(List job_list)
 {
  int i;
+ now = time(NULL);

  if (params.sort == NULL)
  params.sort = xstrdup("P,t,-p"); /* Partition,state,priority */
@@ -583,8 +586,6 @@

 static uint32_t _get_start_time(job_info_t *job)
 {
- time_t now = time(NULL);
-
  if (job->start_time == (time_t) 0)
  return 0xffffffff;
  if ((job->job_state == JOB_PENDING) && (job->start_time < now))
@@ -853,7 +854,6 @@
  job_step_info_t *step2 = (job_step_info_t *) void2;
  time_t now, used1, used2;

- now = time(NULL);
  used1 = difftime(now, step1->start_time);
  used2 = difftime(now, step2->start_time);
  diff = used1 - used2;

On Mon, Nov 7, 2011 at 1:29 PM, Moe Jette <[email protected]> wrote:

Note: This patch will not work with the "-i" or "--iterate" option as
"now" would need to be reset for each iteration.


Quoting "Mark A. Grondona" <[email protected]>:

 On Mon, 7 Nov 2011 13:00:00 -0800, Rayson Ho <[email protected]>

wrote:

On Mon, Nov 7, 2011 at 3:33 PM, Clay Teeter <[email protected]> wrote:
> A couple of things to mention about the cluster (slow) node.
> cat /proc/sys/kernel/vsyscall64 = 0

Yup, if /proc/sys/kernel/vsyscall64 is 0, then gettimeofday() and
friends are real system calls...

So doing ~1B real system call to get the same time(2) data is really
bad for performance!

> the timing loop posted by Mark took > 1m
> Ill implement the suggestions by Mark and Rayson and see what happens!

And replying to Mark's previous email: I agree with Mark that we need
to call time(2) once and save it somewhere, but it is a bigger change
and I am not sure if we will need a newer SLURM version to ship the
change with. The patch I sent is a second choice, my first suggestion
in the email was to call time(2) once and pass the time data into the
comparsion function (but still would require some modifications in
various places).


The following patch is a stopgap that should only initialize
"now" once, preventing unecessary calls to time(2), and ensuring
the value for "now" doesn't change during a sort:

It isn't ideal since there will be an extraneous comparison
for each call to get_start_time, but it should reduce the calls
to time(2) to 1.


From 318bc74372b7a73b0936e47a744144**8b4a693ed7 Mon Sep 17 00:00:00 2001
From: Mark A. Grondona <[email protected]>
Date: Mon, 7 Nov 2011 13:19:16 -0800
Subject: [PATCH] squeue: Only call time(2) once when sorting by job start
time

Since the list_sort algorithm is O(n^2), we want to avoid any
unecessary work in sort compare functions, especially work
that is duplicated. In the specific case of _get_start_time(),
we also want to avoid returning a different start time for
PENDING jobs during a single sort (which may happen if the
sort takes >1s).

So, as a stopgap measure, define "now" as static in _get_start_time
so that we only initialize it once.
---
 src/squeue/sort.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/src/squeue/sort.c b/src/squeue/sort.c
index af194ef..b75dc99 100644
--- a/src/squeue/sort.c
+++ b/src/squeue/sort.c
@@ -583,12 +583,16 @@ static int _sort_job_by_time_limit(void *void1,
void *void2)

 static uint32_t _get_start_time(job_info_t *job)
 {
-       time_t now = time(NULL);
+       static time_t now = (time_t) 0;

       if (job->start_time == (time_t) 0)
               return 0xffffffff;
-       if ((job->job_state == JOB_PENDING) && (job->start_time < now))
-               return (uint32_t) now;
+       if (job->job_state == JOB_PENDING) {
+               if (now == (time_t) 0)
+                       now = time(NULL);
+               if (job->start_time < now)
+                       return (uint32_t) now;
+       }
       return (uint32_t) job->start_time;
 }

--
1.7.1






 Rayson




> Thanks,
> Clay
> On Sat, Nov 5, 2011 at 8:08 AM, Mark A. Grondona <[email protected]>
wrote:
>>
>> On Fri, 4 Nov 2011 20:33:31 -0700, Rayson Ho <[email protected]>
>> wrote:
>> > On Fri, Nov 4, 2011 at 9:58 PM, Mark A. Grondona <
[email protected]>
>> > wrote:
>> > > This is  definitely true, but the fact that Clay's debugger caught
>> > > squeue in the time(2) system call means something strange might be
>> > > going on with that call.
>> >
>> > That's possible, if it is running on an old Linux kernel, or if
>> > /proc/sys/kernel/vsyscall64 is 0.
>>
>> Also depends on the timer used by gettimeofday.
>>
>> > (And BTW, that's 100M calls)
>>
>> Oh, yeah, I increased it to get >1s runtime on my system. sorry!
>>
>> > > Clay, you might try running the following
>> > > benchmark on your system just to see how long 10M time(2) calls
take:
>> >
>> > According to the comments in src/common/list.c, list_sort() has a
>> > complexity of O(n^2). So 30K jobs with complexity of n^2, it's like
>> > 900M calls to time(2)!!
>>
>> Yeah, many of the sort compare functions in squeue have this problem.
>> Lots of code needlessly recomputes the same values over and over again
>> (n^2 - n times, right?)
>>
>> > If that really is the case, then it would be slow to execute almost
1
>> > billion time(2) calls even if the overhead is really low for each
>> > call. (In the end, a VDSO call is not free)
>>
>> FWIW, I tried to reproduce Clay's problem on a test system
>> (though I only got up to 20K jobs), and was not able to.
>>
>>
>> > I am not sure how job->start_time gets assigned in the code (the
>> > header says "time_t start_time;      /* time execution begins,
actual
>> > or expected */", so I would expect it to be unset for pending jobs
>> > without expected start time. I don't have a SLURM cluster that I can
>> > fool around with and to attach a debugger, yet there are too many
>> > start_time variables in the code that is blocking me from digging
>> > deeper!!).
>> >
>> > - If job->start_time really represents the expected start time for
>> > every job, then I guess calling time(2) once and passing the value
to
>> > the comparsion function would make sense.
>> >
>> > - But if the common case start_time has a value of 0, then the
>> > following patch should help (I've checked with gcc 4.4.5, even at
-O3,
>> > there's not sinking of time(NULL) below the first if statement, so
we
>> > need to do it ourselves):
>>
>> Your patch is a good start I would think. However, the start_time
>> is still needlessly recomputed for every pending job (I _think_
>> pending jobs start_time is set to submit time, until they actually
>> start). So if you have 30K pending jobs you'd be in the same boat.
>>
>> Also, think about what happens when the sort takes >1s. The start
>> time of these pending jobs starts to change! That seems wrong.
>>
>> The best fix for this one problem (if it is the problem) would be
>> to set a global "now" that doesn't change, so the time(2) call
>> only needs to be made once, and every pending job gets the same
>> value for start_time.
>>
>> In general, the sorting and other code in squeue could be greatly
>> improved by precomputing results or otherwise memoizing results
>> (i.e. store results in a hash that could be checked before
>> recomputing the same result you had before). A good example
>> is _sort_job_by_node_list(). That function does 'a lot' of work,
>> and like Rayson said, it could be called up to 900M times
>> needlessly for 30K jobs!
>>
>> BTW, in my testing, profiling showed that the slowest part of
>> squeue was the uid_to_string() call, which is called at least
>> once for every job (in my case, every job had the same uid,
>> so that call should have only been made once). Removing that call
>> more than halved the runtime of squeue.
>>
>> mark
>>
>> >
>> > [rayson@computer slurm-2.3.0-2]$ diff -U4 sort.c.original
>> > src/squeue/sort.c
>> > --- sort.c.original     2011-11-04 22:47:02.001064100 -0400
>> > +++ src/squeue/sort.c   2011-11-04 22:47:55.187063769 -0400
>> > @@ -582,13 +582,13 @@
>> >  }
>> >
>> >  static uint32_t _get_start_time(job_info_t *job)
>> >  {
>> > -       time_t now = time(NULL);
>> > +       time_t now;
>> >
>> >         if (job->start_time == (time_t) 0)
>> >                 return 0xffffffff;
>> > -       if ((job->job_state == JOB_PENDING) && (job->start_time <
now))
>> > +       if ((job->job_state == JOB_PENDING) && (job->start_time <
(now =
>> > time(NULL))))
>> >                 return (uint32_t) now;
>> >         return (uint32_t) job->start_time;
>> >  }
>> >
>> > Rayson
>> >
>> >
>> > >
>> > >  #include <time.h>
>> > >  int main (int ac, char **av)
>> > >  {
>> > >        unsigned long i;
>> > >        time_t t;
>> > >        for (i = 0; i < 100000000UL; i++)
>> > >            t = time(NULL);
>> > >            return 0;
>> > >  }
>> > >
>> > > On my system I get:
>> > >
>> > >  $ time ./timebench
>> > >
>> > >  real    0m3.794s
>> > >  user    0m3.791s
>> > >
>> > >
>> > > mark
>> > >
>> > >
>> > >
>> > >> Rayson
>> > >>
>> > >>
>> > >>
>> > >> > Moe, Danny, what do you think about that problem ?
>> > >> >
>> > >> > Regards,
>> > >> > Matthieu
>> > >> >
>> > >> >
>> > >> > On 11/04/2011 08:53 PM, Clay Teeter wrote:
>> > >> >
>> > >> > Ok, here is some general profiling info:
>> > >> > the slurmctl process doesn't seem too terribly out of whack.
 There
>> > >> > are
>> > >> > bursts of 100% cpu usage, but they lasted no longer then a
couple
>> > >> > seconds.
>> > >> > I ran gstack on the slurmctl process while squeue was running.
>> > >> >  Again, I
>> > >> > didn't see anything that look strange:
>> > >> > Thread 5 (Thread 0x41fc6940 (LWP 21939)):
>> > >> > #0  0x00002aaaaacb9df8 in select_p_job_test () from
>> > >> > /usr/lib64/slurm/select_cons_**res.so
>> > >> > #1  0x00002aaaabad8cf3 in backfill_agent () from
>> > >> > /usr/lib64/slurm/sched_**backfill.so
>> > >> > #2  0x0000003cc8e0673d in start_thread () from
>> > >> > /lib64/libpthread.so.0
>> > >> > #3  0x0000003cc86d44bd in clone () from /lib64/libc.so.6
>> > >> > Thread 4 (Thread 0x420c7940 (LWP 21941)):
>> > >> > #0  0x0000003cc86cd722 in select () from /lib64/libc.so.6
>> > >> > #1  0x000000000042daa2 in _slurmctld_rpc_mgr ()
>> > >> > #2  0x0000003cc8e0673d in start_thread () from
>> > >> > /lib64/libpthread.so.0
>> > >> > #3  0x0000003cc86d44bd in clone () from /lib64/libc.so.6
>> > >> > Thread 3 (Thread 0x421c8940 (LWP 21942)):
>> > >> > #0  0x0000003cc8e0e898 in do_sigwait () from
/lib64/libpthread.so.0
>> > >> > #1  0x0000003cc8e0e93d in sigwait () from
/lib64/libpthread.so.0
>> > >> > #2  0x000000000042b838 in _slurmctld_signal_hand ()
>> > >> > #3  0x0000003cc8e0673d in start_thread () from
>> > >> > /lib64/libpthread.so.0
>> > >> > #4  0x0000003cc86d44bd in clone () from /lib64/libc.so.6
>> > >> > Thread 2 (Thread 0x422c9940 (LWP 21943)):
>> > >> > #0  0x0000003cc8e0aee9 in pthread_cond_wait@@GLIBC_2.3.2 ()
from
>> > >> > /lib64/libpthread.so.0
>> > >> > #1  0x000000000046a9c1 in slurmctld_state_save ()
>> > >> > #2  0x0000003cc8e0673d in start_thread () from
>> > >> > /lib64/libpthread.so.0
>> > >> > #3  0x0000003cc86d44bd in clone () from /lib64/libc.so.6
>> > >> > Thread 1 (Thread 0x2aae5df000d0 (LWP 21933)):
>> > >> > #0  0x0000003cc8e0aee9 in pthread_cond_wait@@GLIBC_2.3.2 ()
from
>> > >> > /lib64/libpthread.so.0
>> > >> > #1  0x0000000000449cbf in _wr_wrlock ()
>> > >> > #2  0x000000000044a1d5 in lock_slurmctld ()
>> > >> > #3  0x000000000042af5e in _slurmctld_background ()
>> > >> > #4  0x000000000042cb8c in main ()
>> > >> > However, i did a gstack and strace on squeue and things did
look a
>> > >> > bit weird.
>> > >> > strace:
>> > >> > time(NULL)                              = 1320435240
>> > >> > time(NULL)                              = 1320435240
>> > >> > time(NULL)                              = 1320435240
>> > >> > time(NULL)                              = 1320435240
>> > >> > time(NULL)                              = 1320435240
>> > >> > time(NULL)                              = 1320435240
>> > >> > time(NULL)                              = 1320435240
>> > >> > time(NULL)                              = 1320435240
>> > >> > .....
>> > >> > keeps on going, this might be the sorting algo?
>> > >> >
>> > >> > gstack:
>> > >> > #0  0xffffffffff600410 in ?? ()
>> > >> > #1  0x0000003cc868c40d in time () from /lib64/libc.so.6
>> > >> > #2  0x000000000042608b in _get_start_time ()
>> > >> > #3  0x00000000004260e1 in _sort_job_by_time_start ()
>> > >> > #4  0x0000000000437958 in slurm_list_sort ()
>> > >> > #5  0x00000000004232af in print_jobs_array ()#6
 0x0000000000420234
>> > >> > in
>> > >> > _get_info ()
>> > >> > #7  0x00000000004205a1 in main ()
>> > >> > squeue is pegged at 100% the entire time.  I noticed, that if i
>> > >> > pass in the
>> > >> > param -S u to squeue the function _sort_job_by_time_start is
still
>> > >> > executed.
>> > >> > Some interetesed slurmctl logs enteries:
>> > >> > [2011-11-04T19:50:37] Warning: Note very large processing time
from
>> > >> > _slurm_rpc_dump_jobs: usec=6155478
>> > >> > [2011-11-04T19:50:37] Warning: Note very large processing time
from
>> > >> > schedule: usec=6158333
>> > >> > [2011-11-04T19:50:37] Warning: Note very large processing time
from
>> > >> > _slurmctld_background: usec=15995841
>> > >> >
>> > >> > [2011-11-04T19:46:01] debug:  backfill: loop taking too long,
>> > >> > yielding locks
>> > >> > [2011-11-04T19:46:01] debug2: Testing job time limits and
>> > >> > checkpoints
>> > >> > [2011-11-04T19:46:01] debug2: Performing purge of old job
records
>> > >> > [2011-11-04T19:46:01] Warning: Note very large processing time
from
>> > >> > _slurmctld_background: usec=9779576
>> > >> > Thanks for all your help,
>> > >> > Clay
>> > >> > On Fri, Nov 4, 2011 at 2:16 AM, HAUTREUX Matthieu
>> > >> > <[email protected]>
>> > >> > wrote:
>> > >> >>
>> > >> >> In my understanding of the internals, squeue does not query
each
>> > >> >> node but
>> > >> >> only the controller. That is interesting to know that the
>> > >> >> performance issue
>> > >> >> seems to be due to the EC2 partition but it requires further
>> > >> >> information to
>> > >> >> know what is the reason of such a bad performance.
>> > >> >>
>> > >> >> The best way to find the problem is to observe the internal
states
>> > >> >> of the
>> > >> >> controller while the "squeue -p aws" is running :
>> > >> >> - You should first look at the consumption of the slurmctld
>> > >> >> process to see
>> > >> >> if it is due to a too high complexity of processing in an
internal
>> > >> >> function
>> > >> >> of slurm.
>> > >> >>  The easiest way to have that information is by using top and
look
>> > >> >> at the
>> > >> >> consumption of the slurmctld process.
>> > >> >> - After that, in case you see no anormal consumption, you
should
>> > >> >> try to
>> > >> >> capture the state of the slurmctld process during the hang.
You
>> > >> >> can do that
>> > >> >> just by using the gstack command to capture the stacks of
>> > >> >> slurmctld threads
>> > >> >> and see where they are waiting. If you can send the result of
>> > >> >> "gstack
>> > >> >> $(pgrep slurmctld)" it could bring interesting information.
>> > >> >> - In case you see an anormal consumption, the gstack can help
a
>> > >> >> little
>> > >> >> bit, but a more evolved tool, a profiler like for example
>> > >> >> oprofile, will be
>> > >> >> necessary to see where most of the time is spent in the
controller
>> > >> >>
>> > >> >> Do not hesitate to send the result of the 2 first tests (top,
>> > >> >> gstack), it
>> > >> >> could help to find the bottleneck.
>> > >> >>
>> > >> >> Regards,
>> > >> >> Matthieu
>> > >> >>
>> > >> >> Clay Teeter a écrit :
>> > >> >>>
>> > >> >>> I did notice something.
>> > >> >>>
>> > >> >>> Issuing squeue against the local partition is quite speedy,
~2s
>> > >> >>> for 10K
>> > >> >>> job ids.  However, if i "squeue -p aws" the ec2(aws)
partition i
>> > >> >>> see the 2m+
>> > >> >>> delay.  Does squeue query each node in the partition?
>> > >> >>>
>> > >> >>> Thanks,
>> > >> >>> Clay
>> > >> >>>
>> > >> >>> On Thu, Nov 3, 2011 at 2:42 PM, Clay Teeter <
[email protected]
>> > >> >>> <mailto:[email protected]>> wrote:
>> > >> >>>
>> > >> >>>    First, thanks for your response!
>> > >> >>>
>> > >> >>>    Yeah, I figured that slurm shouldn't have a problem with
the
>> > >> >>> job
>> > >> >>>    count.
>> > >> >>>    First some background.   Our network is split into two
parts;
>> > >> >>> A
>> > >> >>>    local partition and a ec2 partition.  And, for each task
two
>> > >> >>> jobs
>> > >> >>>    (one is dependent on the other) are submitted.  The frist
part
>> > >> >>>    computes a large datafile and submits it to s3, a second
job
>> > >> >>> pulls
>> > >> >>>    the files from s3 and copies it to our san. In total a
large
>> > >> >>> job
>> > >> >>>    requires about 30K jobs to be submitted.
>> > >> >>>
>> > >> >>>    Im not sure when squeue started slowing dow.  Currently,
via
>> > >> >>> sattr
>> > >> >>>    (about 2s query), there are a little more than 17000 jobs
>> > >> >>> running.
>> > >> >>>
>> > >> >>>    Im worried that because half the cluster is in a remote
>> > >> >>> datacenter
>> > >> >>>    latency is the problem.... which, unfortunately, would be
a
>> > >> >>> show
>> > >> >>>    stopper.
>> > >> >>>
>> > >> >>>    Version: slurm 2.3.0-2
>> > >> >>>    ControlMachine=master
>> > >> >>>    AuthType=auth/munge
>> > >> >>>    CacheGroups=0
>> > >> >>>    CryptoType=crypto/munge
>> > >> >>>    MaxJobCount=30000     MpiDefault=none
>> > >> >>>    MpiParams=ports=12000-12999
>> > >> >>>    ProctrackType=proctrack/pgid
>> > >> >>>    ReturnToService=1
>> > >> >>>    SlurmctldPidFile=/var/run/**slurm/slurmctld.pid
>> > >> >>>    SlurmctldPort=6817
>> > >> >>>    SlurmdPidFile=/var/run/slurm/**slurmd.pid
>> > >> >>>    SlurmdPort=6818
>> > >> >>>    SlurmdSpoolDir=/var/slurm/**spool
>> > >> >>>    SlurmUser=slurm
>> > >> >>>    StateSaveLocation=/var/slurm/**state
>> > >> >>>    SwitchType=switch/none
>> > >> >>>    TaskPlugin=task/none
>> > >> >>>    TrackWCKey=yes
>> > >> >>>    InactiveLimit=0
>> > >> >>>    KillWait=30
>> > >> >>>    MinJobAge=300
>> > >> >>>    SlurmctldTimeout=120
>> > >> >>>    SlurmdTimeout=300
>> > >> >>>    Waittime=0
>> > >> >>>    FastSchedule=1
>> > >> >>>    SchedulerType=sched/backfill
>> > >> >>>    SchedulerPort=7321
>> > >> >>>    SelectType=select/cons_res
>> > >> >>>    SelectTypeParameters=CR_Core
>> > >> >>>    AccountingStorageHost=master
>> > >> >>>    AccountingStorageLoc=slurm_**acct_db
>> > >> >>>    AccountingStoragePass=
>> > >> >>>    AccountingStorageType=**accounting_storage/mysql
>> > >> >>>    AccountingStorageUser=
>> > >> >>>    AccountingStoreJobComment=YES
>> > >> >>>    ClusterName=hydra
>> > >> >>>    JobCompHost=master
>> > >> >>>    JobCompLoc=slurm_acct_db
>> > >> >>>    JobCompPass=cluster_service
>> > >> >>>    JobCompType=jobcomp/mysql
>> > >> >>>    JobCompUser=cluster_service
>> > >> >>>    JobAcctGatherFrequency=30
>> > >> >>>    JobAcctGatherType=jobacct_**gather/linux
>> > >> >>>    SlurmctldDebug=3
>> > >> >>>    SlurmctldLogFile=/var/log/**slurm/slurmctld.log
>> > >> >>>    SlurmdDebug=3
>> > >> >>>    SlurmdLogFile=/var/log/slurm/**slurmd.log
>> > >> >>>    NodeName=aws[1-12] RealMemory=22528 Sockets=2
CoresPerSocket=4
>> > >> >>>    ThreadsPerCore=2 State=UNKNOWN     NodeName=centos1
>> > >> >>> RealMemory=3000
>> > >> >>> Sockets=2 CoresPerSocket=2
>> > >> >>>    ThreadsPerCore=1 State=UNKNOWN
>> > >> >>>
>> > >> >>>    PartitionName=local Nodes=centos1 Default=YES
MaxTime=INFINITE
>> > >> >>>    State=UP Shared=NO
>> > >> >>>    PartitionName=all Nodes=aws[3-12],centos1 Default=NO
>> > >> >>>    MaxTime=INFINITE State=UP Shared=NO
>> > >> >>>    PartitionName=aws Nodes=aws[2-12] Default=NO
MaxTime=INFINITE
>> > >> >>>    State=UP Shared=NO
>> > >> >>>
>> > >> >>>
>> > >> >>>
>> > >> >>>    On Thu, Nov 3, 2011 at 2:03 PM, Matthieu Hautreux
>> > >> >>>    <[email protected] <mailto:matthieu.hautreux@cea.*
*fr <[email protected]>>>
>> > >> >>> wrote:
>> > >> >>>
>> > >> >>>        Hi,
>> > >> >>>
>> > >> >>>        what is the version of SLURM that you are using ? Can
you
>> > >> >>> send
>> > >> >>>        your slurm.conf for details of configuration ?
>> > >> >>>
>> > >> >>>        With 2.3.1 you should be able to manage at least 10k
jobs.
>> > >> >>> We
>> > >> >>>        made tests with 2.2.7 and backported patches of 2.3.x
that
>> > >> >>>        shows good results with 10K jobs.
>> > >> >>>        What is the number of jobs for which the system
starts to
>> > >> >>> hang ?
>> > >> >>>
>> > >> >>>        Matthieu
>> > >> >>>
>> > >> >>>
>> > >> >>>        On 11/03/2011 07:03 PM, Clay Teeter wrote:
>> > >> >>>>
>> > >> >>>>        Hi group!
>> > >> >>>>
>> > >> >>>>        Im seeing that with about 30K (or so) jobs in the
queue,
>> > >> >>>>        squeue hangs
>> > >> >>>>        for a around 2 minutes before returning values.  Does
>> > >> >>>> anyone
>> > >> >>>>        know a way
>> > >> >>>>        around this?  I expect that this isn't standard
behavior.
>> > >> >>>>
>> > >> >>>>        Cheers,
>> > >> >>>>        Clay
>> > >> >>>
>> > >> >>>
>> > >> >>>
>> > >> >>
>> > >> >
>> > >> >
>> > >> >
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >>
>> > >> ==============================**====================
>> > >> Open Grid Scheduler - The Official Open Source Grid Engine

>> > >>http://gridscheduler.**sourceforge.net/<http://gridscheduler.sourceforge.net/>

>> > >>
>> > >
>> >
>> >
>> >
>> > --
>> >
>> > ==============================**====================
>> > Open Grid Scheduler - The Official Open Source Grid Engine

>> >http://gridscheduler.**sourceforge.net/<http://gridscheduler.sourceforge.net/>

>> >
>>
>
>



--

==============================**====================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.**sourceforge.net/<http://gridscheduler.sourceforge.net/>

Re: [slurm-dev] Squeue taking 2+ minutes with 30K jobs

Reply via email to