Re: [slurm-dev] Squeue taking 2+ minutes with 30K jobs

Mark A. Grondona Mon, 07 Nov 2011 13:38:16 -0800

On Mon, 7 Nov 2011 13:29:27 -0800, Moe Jette <[email protected]> wrote:
> Note: This patch will not work with the "-i" or "--iterate" option as
> "now" would need to be reset for each iteration.


That is a good point. That patch should not be used. Instead
a global time_t now should be set and initialized at each
iteration.

Thanks,
mark

 
> Quoting "Mark A. Grondona" <[email protected]>:
> 
> > On Mon, 7 Nov 2011 13:00:00 -0800, Rayson Ho <[email protected]> wrote:
> >> On Mon, Nov 7, 2011 at 3:33 PM, Clay Teeter <[email protected]> wrote:
> >> > A couple of things to mention about the cluster (slow) node.
> >> > cat /proc/sys/kernel/vsyscall64 = 0
> >>
> >> Yup, if /proc/sys/kernel/vsyscall64 is 0, then gettimeofday() and
> >> friends are real system calls...
> >>
> >> So doing ~1B real system call to get the same time(2) data is really
> >> bad for performance!
> >>
> >>
> >> > the timing loop posted by Mark took > 1m
> >> > Ill implement the suggestions by Mark and Rayson and see what happens!
> >>
> >> And replying to Mark's previous email: I agree with Mark that we need
> >> to call time(2) once and save it somewhere, but it is a bigger change
> >> and I am not sure if we will need a newer SLURM version to ship the
> >> change with. The patch I sent is a second choice, my first suggestion
> >> in the email was to call time(2) once and pass the time data into the
> >> comparsion function (but still would require some modifications in
> >> various places).
> >
> > The following patch is a stopgap that should only initialize
> > "now" once, preventing unecessary calls to time(2), and ensuring
> > the value for "now" doesn't change during a sort:
> >
> > It isn't ideal since there will be an extraneous comparison
> > for each call to get_start_time, but it should reduce the calls
> > to time(2) to 1.
> >
> >
> > From 318bc74372b7a73b0936e47a7441448b4a693ed7 Mon Sep 17 00:00:00 2001
> > From: Mark A. Grondona <[email protected]>
> > Date: Mon, 7 Nov 2011 13:19:16 -0800
> > Subject: [PATCH] squeue: Only call time(2) once when sorting by job
> > start time
> >
> > Since the list_sort algorithm is O(n^2), we want to avoid any
> > unecessary work in sort compare functions, especially work
> > that is duplicated. In the specific case of _get_start_time(),
> > we also want to avoid returning a different start time for
> > PENDING jobs during a single sort (which may happen if the
> > sort takes >1s).
> >
> > So, as a stopgap measure, define "now" as static in _get_start_time
> > so that we only initialize it once.
> > ---
> >  src/squeue/sort.c |   10 +++++++---
> >  1 files changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/squeue/sort.c b/src/squeue/sort.c
> > index af194ef..b75dc99 100644
> > --- a/src/squeue/sort.c
> > +++ b/src/squeue/sort.c
> > @@ -583,12 +583,16 @@ static int _sort_job_by_time_limit(void
> > *void1, void *void2)
> >
> >  static uint32_t _get_start_time(job_info_t *job)
> >  {
> > -     time_t now = time(NULL);
> > +     static time_t now = (time_t) 0;
> >
> >       if (job->start_time == (time_t) 0)
> >               return 0xffffffff;
> > -     if ((job->job_state == JOB_PENDING) && (job->start_time < now))
> > -             return (uint32_t) now;
> > +     if (job->job_state == JOB_PENDING) {
> > +             if (now == (time_t) 0)
> > +                     now = time(NULL);
> > +             if (job->start_time < now)
> > +                     return (uint32_t) now;
> > +     }
> >       return (uint32_t) job->start_time;
> >  }
> >
> > --
> > 1.7.1
> >
> >
> >
> >
> >
> >
> >> Rayson
> >>
> >>
> >>
> >> > Thanks,
> >> > Clay
> >> > On Sat, Nov 5, 2011 at 8:08 AM, Mark A. Grondona
> >> <[email protected]> wrote:
> >> >>
> >> >> On Fri, 4 Nov 2011 20:33:31 -0700, Rayson Ho <[email protected]>
> >> >> wrote:
> >> >> > On Fri, Nov 4, 2011 at 9:58 PM, Mark A. Grondona <[email protected]>
> >> >> > wrote:
> >> >> > > This is  definitely true, but the fact that Clay's debugger caught
> >> >> > > squeue in the time(2) system call means something strange might be
> >> >> > > going on with that call.
> >> >> >
> >> >> > That's possible, if it is running on an old Linux kernel, or if
> >> >> > /proc/sys/kernel/vsyscall64 is 0.
> >> >>
> >> >> Also depends on the timer used by gettimeofday.
> >> >>
> >> >> > (And BTW, that's 100M calls)
> >> >>
> >> >> Oh, yeah, I increased it to get >1s runtime on my system. sorry!
> >> >>
> >> >> > > Clay, you might try running the following
> >> >> > > benchmark on your system just to see how long 10M time(2) calls 
> >> >> > > take:
> >> >> >
> >> >> > According to the comments in src/common/list.c, list_sort() has a
> >> >> > complexity of O(n^2). So 30K jobs with complexity of n^2, it's like
> >> >> > 900M calls to time(2)!!
> >> >>
> >> >> Yeah, many of the sort compare functions in squeue have this problem.
> >> >> Lots of code needlessly recomputes the same values over and over again
> >> >> (n^2 - n times, right?)
> >> >>
> >> >> > If that really is the case, then it would be slow to execute almost 1
> >> >> > billion time(2) calls even if the overhead is really low for each
> >> >> > call. (In the end, a VDSO call is not free)
> >> >>
> >> >> FWIW, I tried to reproduce Clay's problem on a test system
> >> >> (though I only got up to 20K jobs), and was not able to.
> >> >>
> >> >>
> >> >> > I am not sure how job->start_time gets assigned in the code (the
> >> >> > header says "time_t start_time;      /* time execution begins, actual
> >> >> > or expected */", so I would expect it to be unset for pending jobs
> >> >> > without expected start time. I don't have a SLURM cluster that I can
> >> >> > fool around with and to attach a debugger, yet there are too many
> >> >> > start_time variables in the code that is blocking me from digging
> >> >> > deeper!!).
> >> >> >
> >> >> > - If job->start_time really represents the expected start time for
> >> >> > every job, then I guess calling time(2) once and passing the value to
> >> >> > the comparsion function would make sense.
> >> >> >
> >> >> > - But if the common case start_time has a value of 0, then the
> >> >> > following patch should help (I've checked with gcc 4.4.5, even at -O3,
> >> >> > there's not sinking of time(NULL) below the first if statement, so we
> >> >> > need to do it ourselves):
> >> >>
> >> >> Your patch is a good start I would think. However, the start_time
> >> >> is still needlessly recomputed for every pending job (I _think_
> >> >> pending jobs start_time is set to submit time, until they actually
> >> >> start). So if you have 30K pending jobs you'd be in the same boat.
> >> >>
> >> >> Also, think about what happens when the sort takes >1s. The start
> >> >> time of these pending jobs starts to change! That seems wrong.
> >> >>
> >> >> The best fix for this one problem (if it is the problem) would be
> >> >> to set a global "now" that doesn't change, so the time(2) call
> >> >> only needs to be made once, and every pending job gets the same
> >> >> value for start_time.
> >> >>
> >> >> In general, the sorting and other code in squeue could be greatly
> >> >> improved by precomputing results or otherwise memoizing results
> >> >> (i.e. store results in a hash that could be checked before
> >> >> recomputing the same result you had before). A good example
> >> >> is _sort_job_by_node_list(). That function does 'a lot' of work,
> >> >> and like Rayson said, it could be called up to 900M times
> >> >> needlessly for 30K jobs!
> >> >>
> >> >> BTW, in my testing, profiling showed that the slowest part of
> >> >> squeue was the uid_to_string() call, which is called at least
> >> >> once for every job (in my case, every job had the same uid,
> >> >> so that call should have only been made once). Removing that call
> >> >> more than halved the runtime of squeue.
> >> >>
> >> >> mark
> >> >>
> >> >> >
> >> >> > [rayson@computer slurm-2.3.0-2]$ diff -U4 sort.c.original
> >> >> > src/squeue/sort.c
> >> >> > --- sort.c.original     2011-11-04 22:47:02.001064100 -0400
> >> >> > +++ src/squeue/sort.c   2011-11-04 22:47:55.187063769 -0400
> >> >> > @@ -582,13 +582,13 @@
> >> >> >  }
> >> >> >
> >> >> >  static uint32_t _get_start_time(job_info_t *job)
> >> >> >  {
> >> >> > -       time_t now = time(NULL);
> >> >> > +       time_t now;
> >> >> >
> >> >> >         if (job->start_time == (time_t) 0)
> >> >> >                 return 0xffffffff;
> >> >> > -       if ((job->job_state == JOB_PENDING) && (job->start_time < 
> >> >> > now))
> >> >> > +       if ((job->job_state == JOB_PENDING) &&
> >> (job->start_time < (now =
> >> >> > time(NULL))))
> >> >> >                 return (uint32_t) now;
> >> >> >         return (uint32_t) job->start_time;
> >> >> >  }
> >> >> >
> >> >> > Rayson
> >> >> >
> >> >> >
> >> >> > >
> >> >> > >  #include <time.h>
> >> >> > >  int main (int ac, char **av)
> >> >> > >  {
> >> >> > >        unsigned long i;
> >> >> > >        time_t t;
> >> >> > >        for (i = 0; i < 100000000UL; i++)
> >> >> > >            t = time(NULL);
> >> >> > >            return 0;
> >> >> > >  }
> >> >> > >
> >> >> > > On my system I get:
> >> >> > >
> >> >> > >  $ time ./timebench
> >> >> > >
> >> >> > >  real    0m3.794s
> >> >> > >  user    0m3.791s
> >> >> > >
> >> >> > >
> >> >> > > mark
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > >> Rayson
> >> >> > >>
> >> >> > >>
> >> >> > >>
> >> >> > >> > Moe, Danny, what do you think about that problem ?
> >> >> > >> >
> >> >> > >> > Regards,
> >> >> > >> > Matthieu
> >> >> > >> >
> >> >> > >> >
> >> >> > >> > On 11/04/2011 08:53 PM, Clay Teeter wrote:
> >> >> > >> >
> >> >> > >> > Ok, here is some general profiling info:
> >> >> > >> > the slurmctl process doesn't seem too terribly out of
> >> whack.  There
> >> >> > >> > are
> >> >> > >> > bursts of 100% cpu usage, but they lasted no longer then a couple
> >> >> > >> > seconds.
> >> >> > >> > I ran gstack on the slurmctl process while squeue was running.
> >> >> > >> >  Again, I
> >> >> > >> > didn't see anything that look strange:
> >> >> > >> > Thread 5 (Thread 0x41fc6940 (LWP 21939)):
> >> >> > >> > #0  0x00002aaaaacb9df8 in select_p_job_test () from
> >> >> > >> > /usr/lib64/slurm/select_cons_res.so
> >> >> > >> > #1  0x00002aaaabad8cf3 in backfill_agent () from
> >> >> > >> > /usr/lib64/slurm/sched_backfill.so
> >> >> > >> > #2  0x0000003cc8e0673d in start_thread () from
> >> >> > >> > /lib64/libpthread.so.0
> >> >> > >> > #3  0x0000003cc86d44bd in clone () from /lib64/libc.so.6
> >> >> > >> > Thread 4 (Thread 0x420c7940 (LWP 21941)):
> >> >> > >> > #0  0x0000003cc86cd722 in select () from /lib64/libc.so.6
> >> >> > >> > #1  0x000000000042daa2 in _slurmctld_rpc_mgr ()
> >> >> > >> > #2  0x0000003cc8e0673d in start_thread () from
> >> >> > >> > /lib64/libpthread.so.0
> >> >> > >> > #3  0x0000003cc86d44bd in clone () from /lib64/libc.so.6
> >> >> > >> > Thread 3 (Thread 0x421c8940 (LWP 21942)):
> >> >> > >> > #0  0x0000003cc8e0e898 in do_sigwait () from
> >> /lib64/libpthread.so.0
> >> >> > >> > #1  0x0000003cc8e0e93d in sigwait () from /lib64/libpthread.so.0
> >> >> > >> > #2  0x000000000042b838 in _slurmctld_signal_hand ()
> >> >> > >> > #3  0x0000003cc8e0673d in start_thread () from
> >> >> > >> > /lib64/libpthread.so.0
> >> >> > >> > #4  0x0000003cc86d44bd in clone () from /lib64/libc.so.6
> >> >> > >> > Thread 2 (Thread 0x422c9940 (LWP 21943)):
> >> >> > >> > #0  0x0000003cc8e0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from
> >> >> > >> > /lib64/libpthread.so.0
> >> >> > >> > #1  0x000000000046a9c1 in slurmctld_state_save ()
> >> >> > >> > #2  0x0000003cc8e0673d in start_thread () from
> >> >> > >> > /lib64/libpthread.so.0
> >> >> > >> > #3  0x0000003cc86d44bd in clone () from /lib64/libc.so.6
> >> >> > >> > Thread 1 (Thread 0x2aae5df000d0 (LWP 21933)):
> >> >> > >> > #0  0x0000003cc8e0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from
> >> >> > >> > /lib64/libpthread.so.0
> >> >> > >> > #1  0x0000000000449cbf in _wr_wrlock ()
> >> >> > >> > #2  0x000000000044a1d5 in lock_slurmctld ()
> >> >> > >> > #3  0x000000000042af5e in _slurmctld_background ()
> >> >> > >> > #4  0x000000000042cb8c in main ()
> >> >> > >> > However, i did a gstack and strace on squeue and things did look 
> >> >> > >> > a
> >> >> > >> > bit weird.
> >> >> > >> > strace:
> >> >> > >> > time(NULL)                              = 1320435240
> >> >> > >> > time(NULL)                              = 1320435240
> >> >> > >> > time(NULL)                              = 1320435240
> >> >> > >> > time(NULL)                              = 1320435240
> >> >> > >> > time(NULL)                              = 1320435240
> >> >> > >> > time(NULL)                              = 1320435240
> >> >> > >> > time(NULL)                              = 1320435240
> >> >> > >> > time(NULL)                              = 1320435240
> >> >> > >> > .....
> >> >> > >> > keeps on going, this might be the sorting algo?
> >> >> > >> >
> >> >> > >> > gstack:
> >> >> > >> > #0  0xffffffffff600410 in ?? ()
> >> >> > >> > #1  0x0000003cc868c40d in time () from /lib64/libc.so.6
> >> >> > >> > #2  0x000000000042608b in _get_start_time ()
> >> >> > >> > #3  0x00000000004260e1 in _sort_job_by_time_start ()
> >> >> > >> > #4  0x0000000000437958 in slurm_list_sort ()
> >> >> > >> > #5  0x00000000004232af in print_jobs_array ()#6
> >> 0x0000000000420234
> >> >> > >> > in
> >> >> > >> > _get_info ()
> >> >> > >> > #7  0x00000000004205a1 in main ()
> >> >> > >> > squeue is pegged at 100% the entire time.  I noticed, that if i
> >> >> > >> > pass in the
> >> >> > >> > param -S u to squeue the function _sort_job_by_time_start is 
> >> >> > >> > still
> >> >> > >> > executed.
> >> >> > >> > Some interetesed slurmctl logs enteries:
> >> >> > >> > [2011-11-04T19:50:37] Warning: Note very large processing
> >> time from
> >> >> > >> > _slurm_rpc_dump_jobs: usec=6155478
> >> >> > >> > [2011-11-04T19:50:37] Warning: Note very large processing
> >> time from
> >> >> > >> > schedule: usec=6158333
> >> >> > >> > [2011-11-04T19:50:37] Warning: Note very large processing
> >> time from
> >> >> > >> > _slurmctld_background: usec=15995841
> >> >> > >> >
> >> >> > >> > [2011-11-04T19:46:01] debug:  backfill: loop taking too long,
> >> >> > >> > yielding locks
> >> >> > >> > [2011-11-04T19:46:01] debug2: Testing job time limits and
> >> >> > >> > checkpoints
> >> >> > >> > [2011-11-04T19:46:01] debug2: Performing purge of old job records
> >> >> > >> > [2011-11-04T19:46:01] Warning: Note very large processing
> >> time from
> >> >> > >> > _slurmctld_background: usec=9779576
> >> >> > >> > Thanks for all your help,
> >> >> > >> > Clay
> >> >> > >> > On Fri, Nov 4, 2011 at 2:16 AM, HAUTREUX Matthieu
> >> >> > >> > <[email protected]>
> >> >> > >> > wrote:
> >> >> > >> >>
> >> >> > >> >> In my understanding of the internals, squeue does not query each
> >> >> > >> >> node but
> >> >> > >> >> only the controller. That is interesting to know that the
> >> >> > >> >> performance issue
> >> >> > >> >> seems to be due to the EC2 partition but it requires further
> >> >> > >> >> information to
> >> >> > >> >> know what is the reason of such a bad performance.
> >> >> > >> >>
> >> >> > >> >> The best way to find the problem is to observe the
> >> internal states
> >> >> > >> >> of the
> >> >> > >> >> controller while the "squeue -p aws" is running :
> >> >> > >> >> - You should first look at the consumption of the slurmctld
> >> >> > >> >> process to see
> >> >> > >> >> if it is due to a too high complexity of processing in
> >> an internal
> >> >> > >> >> function
> >> >> > >> >> of slurm.
> >> >> > >> >>  The easiest way to have that information is by using
> >> top and look
> >> >> > >> >> at the
> >> >> > >> >> consumption of the slurmctld process.
> >> >> > >> >> - After that, in case you see no anormal consumption, you should
> >> >> > >> >> try to
> >> >> > >> >> capture the state of the slurmctld process during the hang. You
> >> >> > >> >> can do that
> >> >> > >> >> just by using the gstack command to capture the stacks of
> >> >> > >> >> slurmctld threads
> >> >> > >> >> and see where they are waiting. If you can send the result of
> >> >> > >> >> "gstack
> >> >> > >> >> $(pgrep slurmctld)" it could bring interesting information.
> >> >> > >> >> - In case you see an anormal consumption, the gstack can help a
> >> >> > >> >> little
> >> >> > >> >> bit, but a more evolved tool, a profiler like for example
> >> >> > >> >> oprofile, will be
> >> >> > >> >> necessary to see where most of the time is spent in the
> >> controller
> >> >> > >> >>
> >> >> > >> >> Do not hesitate to send the result of the 2 first tests (top,
> >> >> > >> >> gstack), it
> >> >> > >> >> could help to find the bottleneck.
> >> >> > >> >>
> >> >> > >> >> Regards,
> >> >> > >> >> Matthieu
> >> >> > >> >>
> >> >> > >> >> Clay Teeter a écrit :
> >> >> > >> >>>
> >> >> > >> >>> I did notice something.
> >> >> > >> >>>
> >> >> > >> >>> Issuing squeue against the local partition is quite speedy, ~2s
> >> >> > >> >>> for 10K
> >> >> > >> >>> job ids.  However, if i "squeue -p aws" the ec2(aws) partition 
> >> >> > >> >>> i
> >> >> > >> >>> see the 2m+
> >> >> > >> >>> delay.  Does squeue query each node in the partition?
> >> >> > >> >>>
> >> >> > >> >>> Thanks,
> >> >> > >> >>> Clay
> >> >> > >> >>>
> >> >> > >> >>> On Thu, Nov 3, 2011 at 2:42 PM, Clay Teeter <[email protected]
> >> >> > >> >>> <mailto:[email protected]>> wrote:
> >> >> > >> >>>
> >> >> > >> >>>    First, thanks for your response!
> >> >> > >> >>>
> >> >> > >> >>>    Yeah, I figured that slurm shouldn't have a problem with the
> >> >> > >> >>> job
> >> >> > >> >>>    count.
> >> >> > >> >>>    First some background.   Our network is split into two 
> >> >> > >> >>> parts;
> >> >> > >> >>> A
> >> >> > >> >>>    local partition and a ec2 partition.  And, for each task two
> >> >> > >> >>> jobs
> >> >> > >> >>>    (one is dependent on the other) are submitted.  The
> >> frist part
> >> >> > >> >>>    computes a large datafile and submits it to s3, a second job
> >> >> > >> >>> pulls
> >> >> > >> >>>    the files from s3 and copies it to our san. In total a large
> >> >> > >> >>> job
> >> >> > >> >>>    requires about 30K jobs to be submitted.
> >> >> > >> >>>
> >> >> > >> >>>    Im not sure when squeue started slowing dow.  Currently, via
> >> >> > >> >>> sattr
> >> >> > >> >>>    (about 2s query), there are a little more than 17000 jobs
> >> >> > >> >>> running.
> >> >> > >> >>>
> >> >> > >> >>>    Im worried that because half the cluster is in a remote
> >> >> > >> >>> datacenter
> >> >> > >> >>>    latency is the problem.... which, unfortunately, would be a
> >> >> > >> >>> show
> >> >> > >> >>>    stopper.
> >> >> > >> >>>
> >> >> > >> >>>    Version: slurm 2.3.0-2
> >> >> > >> >>>    ControlMachine=master
> >> >> > >> >>>    AuthType=auth/munge
> >> >> > >> >>>    CacheGroups=0
> >> >> > >> >>>    CryptoType=crypto/munge
> >> >> > >> >>>    MaxJobCount=30000     MpiDefault=none
> >> >> > >> >>>    MpiParams=ports=12000-12999
> >> >> > >> >>>    ProctrackType=proctrack/pgid
> >> >> > >> >>>    ReturnToService=1
> >> >> > >> >>>    SlurmctldPidFile=/var/run/slurm/slurmctld.pid
> >> >> > >> >>>    SlurmctldPort=6817
> >> >> > >> >>>    SlurmdPidFile=/var/run/slurm/slurmd.pid
> >> >> > >> >>>    SlurmdPort=6818
> >> >> > >> >>>    SlurmdSpoolDir=/var/slurm/spool
> >> >> > >> >>>    SlurmUser=slurm
> >> >> > >> >>>    StateSaveLocation=/var/slurm/state
> >> >> > >> >>>    SwitchType=switch/none
> >> >> > >> >>>    TaskPlugin=task/none
> >> >> > >> >>>    TrackWCKey=yes
> >> >> > >> >>>    InactiveLimit=0
> >> >> > >> >>>    KillWait=30
> >> >> > >> >>>    MinJobAge=300
> >> >> > >> >>>    SlurmctldTimeout=120
> >> >> > >> >>>    SlurmdTimeout=300
> >> >> > >> >>>    Waittime=0
> >> >> > >> >>>    FastSchedule=1
> >> >> > >> >>>    SchedulerType=sched/backfill
> >> >> > >> >>>    SchedulerPort=7321
> >> >> > >> >>>    SelectType=select/cons_res
> >> >> > >> >>>    SelectTypeParameters=CR_Core
> >> >> > >> >>>    AccountingStorageHost=master
> >> >> > >> >>>    AccountingStorageLoc=slurm_acct_db
> >> >> > >> >>>    AccountingStoragePass=
> >> >> > >> >>>    AccountingStorageType=accounting_storage/mysql
> >> >> > >> >>>    AccountingStorageUser=
> >> >> > >> >>>    AccountingStoreJobComment=YES
> >> >> > >> >>>    ClusterName=hydra
> >> >> > >> >>>    JobCompHost=master
> >> >> > >> >>>    JobCompLoc=slurm_acct_db
> >> >> > >> >>>    JobCompPass=cluster_service
> >> >> > >> >>>    JobCompType=jobcomp/mysql
> >> >> > >> >>>    JobCompUser=cluster_service
> >> >> > >> >>>    JobAcctGatherFrequency=30
> >> >> > >> >>>    JobAcctGatherType=jobacct_gather/linux
> >> >> > >> >>>    SlurmctldDebug=3
> >> >> > >> >>>    SlurmctldLogFile=/var/log/slurm/slurmctld.log
> >> >> > >> >>>    SlurmdDebug=3
> >> >> > >> >>>    SlurmdLogFile=/var/log/slurm/slurmd.log
> >> >> > >> >>>    NodeName=aws[1-12] RealMemory=22528 Sockets=2
> >> CoresPerSocket=4
> >> >> > >> >>>    ThreadsPerCore=2 State=UNKNOWN     NodeName=centos1
> >> >> > >> >>> RealMemory=3000
> >> >> > >> >>> Sockets=2 CoresPerSocket=2
> >> >> > >> >>>    ThreadsPerCore=1 State=UNKNOWN
> >> >> > >> >>>
> >> >> > >> >>>    PartitionName=local Nodes=centos1 Default=YES
> >> MaxTime=INFINITE
> >> >> > >> >>>    State=UP Shared=NO
> >> >> > >> >>>    PartitionName=all Nodes=aws[3-12],centos1 Default=NO
> >> >> > >> >>>    MaxTime=INFINITE State=UP Shared=NO
> >> >> > >> >>>    PartitionName=aws Nodes=aws[2-12] Default=NO 
> >> >> > >> >>> MaxTime=INFINITE
> >> >> > >> >>>    State=UP Shared=NO
> >> >> > >> >>>
> >> >> > >> >>>
> >> >> > >> >>>
> >> >> > >> >>>    On Thu, Nov 3, 2011 at 2:03 PM, Matthieu Hautreux
> >> >> > >> >>>    <[email protected] <mailto:[email protected]>>
> >> >> > >> >>> wrote:
> >> >> > >> >>>
> >> >> > >> >>>        Hi,
> >> >> > >> >>>
> >> >> > >> >>>        what is the version of SLURM that you are using ? Can 
> >> >> > >> >>> you
> >> >> > >> >>> send
> >> >> > >> >>>        your slurm.conf for details of configuration ?
> >> >> > >> >>>
> >> >> > >> >>>        With 2.3.1 you should be able to manage at least
> >> 10k jobs.
> >> >> > >> >>> We
> >> >> > >> >>>        made tests with 2.2.7 and backported patches of
> >> 2.3.x that
> >> >> > >> >>>        shows good results with 10K jobs.
> >> >> > >> >>>        What is the number of jobs for which the system starts 
> >> >> > >> >>> to
> >> >> > >> >>> hang ?
> >> >> > >> >>>
> >> >> > >> >>>        Matthieu
> >> >> > >> >>>
> >> >> > >> >>>
> >> >> > >> >>>        On 11/03/2011 07:03 PM, Clay Teeter wrote:
> >> >> > >> >>>>
> >> >> > >> >>>>        Hi group!
> >> >> > >> >>>>
> >> >> > >> >>>>        Im seeing that with about 30K (or so) jobs in the 
> >> >> > >> >>>> queue,
> >> >> > >> >>>>        squeue hangs
> >> >> > >> >>>>        for a around 2 minutes before returning values.  Does
> >> >> > >> >>>> anyone
> >> >> > >> >>>>        know a way
> >> >> > >> >>>>        around this?  I expect that this isn't standard
> >> behavior.
> >> >> > >> >>>>
> >> >> > >> >>>>        Cheers,
> >> >> > >> >>>>        Clay
> >> >> > >> >>>
> >> >> > >> >>>
> >> >> > >> >>>
> >> >> > >> >>
> >> >> > >> >
> >> >> > >> >
> >> >> > >> >
> >> >> > >>
> >> >> > >>
> >> >> > >>
> >> >> > >> --
> >> >> > >>
> >> >> > >> ==================================================
> >> >> > >> Open Grid Scheduler - The Official Open Source Grid Engine
> >> >> > >> http://gridscheduler.sourceforge.net/
> >> >> > >>
> >> >> > >
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> >
> >> >> > ==================================================
> >> >> > Open Grid Scheduler - The Official Open Source Grid Engine
> >> >> > http://gridscheduler.sourceforge.net/
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >>
> >> ==================================================
> >> Open Grid Scheduler - The Official Open Source Grid Engine
> >> http://gridscheduler.sourceforge.net/
> >>
> >
> >
> 
> 
>

Re: [slurm-dev] Squeue taking 2+ minutes with 30K jobs

Reply via email to