Hi All:

Here is my update for the past week. I was holding this e-mail till I 
checked in the code into subversion repo, which I did a few minutes ago.

So, here is the list of things acomplished:

Mainly, I got the jobmond.py implementation completed working with SGE. As 
a result of some issues with DRMAA in SGE, the current code is not 100% 
DRMAA. From the SGE-devel mailing lists, it appears that these will be 
sorted out by the 6.0u9 release of SGE. I will keep the pure DRMAA code 
hanging in there comented out, so we can easily include it to support 
DRMAA as and when its available in SGE. More info about the issue and 
my discussion on the SGE-devel can be found at:

http://gridengine.sunsource.net/servlets/ReadMsg?list=dev&msgNo=2771
http://gridengine.sunsource.net/issues/show_bug.cgi?id=1485

Here are some of the notes about the current jobmond.py:

- It collects the SGE job info by performing a qstat -ext -xml and puts it 
into a file (the location of which is included in jobmond.conf)
- I have implemented a XML parser (based on SAX) for sifting through this 
and collecting the requisite information about all the jobs. (DOM parser 
might slow things down when SGE returns verbose XML info in case of large 
number of jobs and we dont need write access to XML tree; So, I went with 
SAX)
- Jobs with no change in their status are reported as such; new jobs are 
added; jobs with changed status are updated
- This information is formed into a dictionary with the job IDs acting as 
keys and the corresponding value indicates the job status

An example is as below:

The key is '169'

Status=pending
JB_job_number=169
JAT_prio=0.00000
JAT_ntix=0.00000 
JB_name=Sleeper
JB_owner=babu
JB_project=Unknown 
JB_department=defaultdepartment
state=qw
tickets=0
JB_override_tickets=0 
JB_jobshare=0
otickets=0
ftickets=0
stickets=0
JAT_share=0.00000 
queue_name=sample.q
slots=1

and these info are collected for every job that is currently under the 
control of SGE.

- This information is then multicast using the gmetric tool of Ganglia so 
as to be available via gmond daemons. 
- The above steps are repeated every BATCH_POLL_INTERVAL seconds as 
indicated in jobmond.conf

Note: When the DRMAA issue is sorted out, the job status will then be 
obtained by establishing a DRMAA session. It would look as below:

s=DRMAA.Session()
s.init()

for jobid in self.qstatparser.attribs:
    job_status[jobid] = s.getJobProgramStatus(jobid)
    ##...Code here for translating the DRMA status code
    ##...into string such as queued active, on hold, etc.,
    ##...and update the status key in the attribs dictionary


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Oscar-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-devel

Reply via email to