From: [EMAIL PROTECTED] on behalf of Babu Sundaram
Sent: Mon 10/07/2006 14:03
To: [email protected]
Subject: [Oscar-devel] Weekly update - SoC JobMonarch & Ganglia
Hi All:
Here is my update for the past week. I was
holding this e-mail till I
checked in the code into subversion repo, which I
did a few minutes ago.
So, here is the list of things
acomplished:
Mainly, I got the jobmond.py implementation completed
working with SGE. As
a result of some issues with DRMAA in SGE, the current
code is not 100%
DRMAA. From the SGE-devel mailing lists, it appears that
these will be
sorted out by the 6.0u9 release of SGE. I will keep the pure
DRMAA code
hanging in there comented out, so we can easily include it to
support
DRMAA as and when its available in SGE. More info about the issue
and
my discussion on the SGE-devel can be found at:
http://gridengine.sunsource.net/servlets/ReadMsg?list=dev&msgNo=2771
http://gridengine.sunsource.net/issues/show_bug.cgi?id=1485
Here
are some of the notes about the current jobmond.py:
- It collects the SGE
job info by performing a qstat -ext -xml and puts it
into a file (the
location of which is included in jobmond.conf)
- I have implemented a XML
parser (based on SAX) for sifting through this
and collecting the requisite
information about all the jobs. (DOM parser
might slow things down when SGE
returns verbose XML info in case of large
number of jobs and we dont need
write access to XML tree; So, I went with
SAX)
- Jobs with no change in
their status are reported as such; new jobs are
added; jobs with changed
status are updated
- This information is formed into a dictionary with the
job IDs acting as
keys and the corresponding value indicates the job
status
An example is as below:
The key is
'169'
Status=pending
JB_job_number=169
JAT_prio=0.00000
JAT_ntix=0.00000
JB_name=Sleeper
JB_owner=babu
JB_project=Unknown
JB_department=defaultdepartment
state=qw
tickets=0
JB_override_tickets=0
JB_jobshare=0
otickets=0
ftickets=0
stickets=0
JAT_share=0.00000
queue_name=sample.q
slots=1
and
these info are collected for every job that is currently under the
control of
SGE.
- This information is then multicast using the gmetric tool of
Ganglia so
as to be available via gmond daemons.
- The above steps are
repeated every BATCH_POLL_INTERVAL seconds as
indicated in
jobmond.conf
Note: When the DRMAA issue is sorted out, the job status
will then be
obtained by establishing a DRMAA session. It would look as
below:
s=DRMAA.Session()
s.init()
for jobid in
self.qstatparser.attribs:
job_status[jobid] =
s.getJobProgramStatus(jobid)
##...Code here for
translating the DRMA status code
##...into string such as
queued active, on hold, etc.,
##...and update the status
key in the attribs
dictionary
-------------------------------------------------------------------------
Using
Tomcat but need to do more? Need to support web services, security?
Get stuff
done quickly with pre-integrated technology to make your job easier
Download
IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Oscar-devel
mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-devel
------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________ Oscar-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/oscar-devel
