Re: [gridengine users] Raising an old share tree bug...

Orlando Richards Wed, 06 Feb 2013 06:07:14 -0800

Hi Ben,

On 06/02/13 13:12, Ben De Luca wrote:

Im fairly sure we are affected by this bug too, I am happy to help in
the hunt and I have looked through the code more than once.

Are you doing anything to work around it at all? At the moment, we'readjusting the shares to accommodate the over accounting - but that is avery blunt tool and skews our allocations massively. We're reluctant togo for purely functional shares, as our service definition is currentlyfixed on fair share.

Which version of grid are you trying to fix? I havn't been following
grid dev too closely do we still have multiple forks?

We notice it most on our current 6.2u5 deployment, which we're movingaway from to 8.0.0e from Son Of Grid Engine. That's not to say it isn'tpresent in the 8.0.0e - we still have a lot of the troublesome workloadon the 6.2u5 cluster, and I'm sure I've seen it happening on the 8.0.0ecluster (though I now don't have any evidence of that).



--
Orlando



On Wed, Feb 6, 2013 at 12:07 PM, Mark Dixon <[email protected]
<mailto:[email protected]>> wrote:

    On Wed, 6 Feb 2013, Orlando Richards wrote:
    ...

        I've had a go at digging through the code, but couldn't really
        make head

        nor tail of it - no doubt in large part due to my not being much
        of a
        coder :( Any pointers to get me bootstrapped would be most welcome.


    General comments about the source...

    Don't be intimidated. It's a large code base, but spend a little
    time and it'll start to make sense. Pick a little bit of it to focus
    on initially.

    Gridengine's source code is layered. The source distribution has a
    few HTML files describing them (some of which still need updating
    from the 6.0 days...). Functions near the very top and very bottom
    of the stack are relatively well commented, but the rest can be a
    little hit and miss.

    Ignoring most of the layers, you've essentially got:

    At the bottom you've got the wonderful CULL layer: it's very solid
    and provides gridengine with safe complicated data structures. I'd
    like to pat the person who wrote it on the back, although I admit
    I've yet to get my head round the advanced search functionality.
    State data for jobs and so on tend to use it. Use of it can be
    identified by the data types or functions prefixed with "l".

    While I'm on data structures, there are also "dstrings" - which
    provide safe string handling.

    In the middle you've got the GDI, which is the set of libraries used
    by the different components to communicate with each other over the
    network.

    At the top you've got the qmaster, execd, etc., which can be thought
    of as loosely coupled applications that all use the same underlying
    libraries/layers to coordinate.

    I've spent most of my time in the execd, which is pretty easy but
    messy [a very large number of special cases - not totally unexpected
    with the number of platforms supported over the years, but ripe for
    some refactoring]. I've had a brief play in the qmaster and my first
    impression is that it's more consistent and "solid" than the execd,
    but more complicated.


    General tips for debugging gridengine...

    1) Play with the loglevel setting in "qconf -sconf" and read the
    messages files.


    2) Figure out how to stick gridengine into debug mode.
    https://blogs.oracle.com/__templedf/entry/using___debugging_output
    <https://blogs.oracle.com/templedf/entry/using_debugging_output>

    Essentially something like:
       * Setup sge environment (SGE_ROOT, SGE_QMASTER_PORT, etc.)
       * Execute: . $SGE_ROOT/util/dl.sh
       * Execute: dl 1
       * Execute: $SGE_ROOT/bin/lx-amd64/sge___execd

    The program will not daemonise and will print lots of interesting
    stuff. Different 'dl' values will give you different output. I
    generally find that anything greater than 1 is "too much".

    This technique will work for pretty much any gridengine component.
    Even qsub.


    3) Run gridengine under gdb.

    I don't know if you've had much experience with gdb but, once you've
    got the hang of it, it's very useful in figuring out what some code
    generally does without actually understanding the details. Once
    you've followed your nose to something that doesn't look right, you
    can then spend time figuring things out.

    I think some of the gridengine forks try to provide builds with
    enough debugging information for this to work, but I tend to build
    my own gridengine so that I can easily recompile after editing the
    source with potential fixes.

    Make sure you build with the "-no-opt" and "-debug" flags to aimk
    (disables optimisation and enables debugging symbols) and keep the
    source tree kicking around for gdb to read. I run our production
    gridengine with those flags and haven't noticed any serious
    performance problems.

    Once you have gridengine running under gdb and playing with
    breakpoints and the rest, you can easily examine interesting data
    structures with commands like "p lWriteList(ptr)", "p
    lWriteElem(ptr)" and
    "p sge_dstring_get_string(ptr)" (where ptr is a lList*, lListElem*
    or dstring*, respectively).


    ...

        At the moment, I'm trying to get a reproducible test case
        together to
        allow for useful debugging - basic tests (sleep 60s) don't show an
        obvious triggering of the issue, so I'm moving onto more complicated
        tasks. Certainly, the issue does seem to create orders-of-magnitude
        differences in reported usage. Current offenders include BLAST
        jobs (run
        by our Biology users) - which are fairly memory heavy.

    ...

    Being able to reproduce the problem will obviously make things far,
    far easier! If you cannot, you're probably reduced to littering the
    relevant qmaster code with INFO(())/WARNING(())/ERROR(()) statements
    (and checking that loglevel in "qconf -sconf" is set to the
    appropriate value) and seeing what appears in the messages files in
    production.

    If you're lucky, the problem might be evident in the usage
    information being sent from the execd to the qmaster. Running the
    execd in debug mode with "dl 1" will reveal what CPU/MEM/IO values
    the qmaster is being given to be used in the accounting file and the
    share tree.

    If you're unlucky, the problem is in how the qmaster aggregates,
    records and decays the share tree values over time.

    If you're really unlucky, the problem might only occur if the
    various gridengine components are under severe stress.

    I find that having a non-production installation of gridengine
    kicking around, perhaps in virtual machines, is very handy :)

    Hope this helps...


    Mark
    --
    ------------------------------__------------------------------__-----
    Mark Dixon                       Email    : [email protected]
    <mailto:[email protected]>
    HPC/Grid Systems Support         Tel (int): 35429
    Information Systems Services     Tel (ext): +44(0)113 343 5429
    <tel:%2B44%280%29113%20343%205429>
    University of Leeds, LS2 9JT, UK
    ------------------------------__------------------------------__-----
    _________________________________________________
    users mailing list
    [email protected] <mailto:[email protected]>
    https://gridengine.org/__mailman/listinfo/users
    <https://gridengine.org/mailman/listinfo/users>



--
            --
   Dr Orlando Richards
  Information Services
IT Infrastructure Division
       Unix Section
    Tel: 0131 650 4994

The University of Edinburgh is a charitable body, registered inScotland, with registration number SC005336.

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Raising an old share tree bug...

Reply via email to