Hi Ben,
On 06/02/13 13:12, Ben De Luca wrote:
Im fairly sure we are affected by this bug too, I am happy to help in
the hunt and I have looked through the code more than once.
Are you doing anything to work around it at all? At the moment, we're
adjusting the shares to accommodate the over accounting - but that is a
very blunt tool and skews our allocations massively. We're reluctant to
go for purely functional shares, as our service definition is currently
fixed on fair share.
Which version of grid are you trying to fix? I havn't been following
grid dev too closely do we still have multiple forks?
We notice it most on our current 6.2u5 deployment, which we're moving
away from to 8.0.0e from Son Of Grid Engine. That's not to say it isn't
present in the 8.0.0e - we still have a lot of the troublesome workload
on the 6.2u5 cluster, and I'm sure I've seen it happening on the 8.0.0e
cluster (though I now don't have any evidence of that).
--
Orlando
On Wed, Feb 6, 2013 at 12:07 PM, Mark Dixon <[email protected]
<mailto:[email protected]>> wrote:
On Wed, 6 Feb 2013, Orlando Richards wrote:
...
I've had a go at digging through the code, but couldn't really
make head
nor tail of it - no doubt in large part due to my not being much
of a
coder :( Any pointers to get me bootstrapped would be most welcome.
General comments about the source...
Don't be intimidated. It's a large code base, but spend a little
time and it'll start to make sense. Pick a little bit of it to focus
on initially.
Gridengine's source code is layered. The source distribution has a
few HTML files describing them (some of which still need updating
from the 6.0 days...). Functions near the very top and very bottom
of the stack are relatively well commented, but the rest can be a
little hit and miss.
Ignoring most of the layers, you've essentially got:
At the bottom you've got the wonderful CULL layer: it's very solid
and provides gridengine with safe complicated data structures. I'd
like to pat the person who wrote it on the back, although I admit
I've yet to get my head round the advanced search functionality.
State data for jobs and so on tend to use it. Use of it can be
identified by the data types or functions prefixed with "l".
While I'm on data structures, there are also "dstrings" - which
provide safe string handling.
In the middle you've got the GDI, which is the set of libraries used
by the different components to communicate with each other over the
network.
At the top you've got the qmaster, execd, etc., which can be thought
of as loosely coupled applications that all use the same underlying
libraries/layers to coordinate.
I've spent most of my time in the execd, which is pretty easy but
messy [a very large number of special cases - not totally unexpected
with the number of platforms supported over the years, but ripe for
some refactoring]. I've had a brief play in the qmaster and my first
impression is that it's more consistent and "solid" than the execd,
but more complicated.
General tips for debugging gridengine...
1) Play with the loglevel setting in "qconf -sconf" and read the
messages files.
2) Figure out how to stick gridengine into debug mode.
https://blogs.oracle.com/__templedf/entry/using___debugging_output
<https://blogs.oracle.com/templedf/entry/using_debugging_output>
Essentially something like:
* Setup sge environment (SGE_ROOT, SGE_QMASTER_PORT, etc.)
* Execute: . $SGE_ROOT/util/dl.sh
* Execute: dl 1
* Execute: $SGE_ROOT/bin/lx-amd64/sge___execd
The program will not daemonise and will print lots of interesting
stuff. Different 'dl' values will give you different output. I
generally find that anything greater than 1 is "too much".
This technique will work for pretty much any gridengine component.
Even qsub.
3) Run gridengine under gdb.
I don't know if you've had much experience with gdb but, once you've
got the hang of it, it's very useful in figuring out what some code
generally does without actually understanding the details. Once
you've followed your nose to something that doesn't look right, you
can then spend time figuring things out.
I think some of the gridengine forks try to provide builds with
enough debugging information for this to work, but I tend to build
my own gridengine so that I can easily recompile after editing the
source with potential fixes.
Make sure you build with the "-no-opt" and "-debug" flags to aimk
(disables optimisation and enables debugging symbols) and keep the
source tree kicking around for gdb to read. I run our production
gridengine with those flags and haven't noticed any serious
performance problems.
Once you have gridengine running under gdb and playing with
breakpoints and the rest, you can easily examine interesting data
structures with commands like "p lWriteList(ptr)", "p
lWriteElem(ptr)" and
"p sge_dstring_get_string(ptr)" (where ptr is a lList*, lListElem*
or dstring*, respectively).
...
At the moment, I'm trying to get a reproducible test case
together to
allow for useful debugging - basic tests (sleep 60s) don't show an
obvious triggering of the issue, so I'm moving onto more complicated
tasks. Certainly, the issue does seem to create orders-of-magnitude
differences in reported usage. Current offenders include BLAST
jobs (run
by our Biology users) - which are fairly memory heavy.
...
Being able to reproduce the problem will obviously make things far,
far easier! If you cannot, you're probably reduced to littering the
relevant qmaster code with INFO(())/WARNING(())/ERROR(()) statements
(and checking that loglevel in "qconf -sconf" is set to the
appropriate value) and seeing what appears in the messages files in
production.
If you're lucky, the problem might be evident in the usage
information being sent from the execd to the qmaster. Running the
execd in debug mode with "dl 1" will reveal what CPU/MEM/IO values
the qmaster is being given to be used in the accounting file and the
share tree.
If you're unlucky, the problem is in how the qmaster aggregates,
records and decays the share tree values over time.
If you're really unlucky, the problem might only occur if the
various gridengine components are under severe stress.
I find that having a non-production installation of gridengine
kicking around, perhaps in virtual machines, is very handy :)
Hope this helps...
Mark
--
------------------------------__------------------------------__-----
Mark Dixon Email : [email protected]
<mailto:[email protected]>
HPC/Grid Systems Support Tel (int): 35429
Information Systems Services Tel (ext): +44(0)113 343 5429
<tel:%2B44%280%29113%20343%205429>
University of Leeds, LS2 9JT, UK
------------------------------__------------------------------__-----
_________________________________________________
users mailing list
[email protected] <mailto:[email protected]>
https://gridengine.org/__mailman/listinfo/users
<https://gridengine.org/mailman/listinfo/users>
--
--
Dr Orlando Richards
Information Services
IT Infrastructure Division
Unix Section
Tel: 0131 650 4994
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users