Hi All: Here is my update for the past week. I was holding this e-mail till I checked in the code into subversion repo, which I did a few minutes ago.
So, here is the list of things acomplished: Mainly, I got the jobmond.py implementation completed working with SGE. As a result of some issues with DRMAA in SGE, the current code is not 100% DRMAA. From the SGE-devel mailing lists, it appears that these will be sorted out by the 6.0u9 release of SGE. I will keep the pure DRMAA code hanging in there comented out, so we can easily include it to support DRMAA as and when its available in SGE. More info about the issue and my discussion on the SGE-devel can be found at: http://gridengine.sunsource.net/servlets/ReadMsg?list=dev&msgNo=2771 http://gridengine.sunsource.net/issues/show_bug.cgi?id=1485 Here are some of the notes about the current jobmond.py: - It collects the SGE job info by performing a qstat -ext -xml and puts it into a file (the location of which is included in jobmond.conf) - I have implemented a XML parser (based on SAX) for sifting through this and collecting the requisite information about all the jobs. (DOM parser might slow things down when SGE returns verbose XML info in case of large number of jobs and we dont need write access to XML tree; So, I went with SAX) - Jobs with no change in their status are reported as such; new jobs are added; jobs with changed status are updated - This information is formed into a dictionary with the job IDs acting as keys and the corresponding value indicates the job status An example is as below: The key is '169' Status=pending JB_job_number=169 JAT_prio=0.00000 JAT_ntix=0.00000 JB_name=Sleeper JB_owner=babu JB_project=Unknown JB_department=defaultdepartment state=qw tickets=0 JB_override_tickets=0 JB_jobshare=0 otickets=0 ftickets=0 stickets=0 JAT_share=0.00000 queue_name=sample.q slots=1 and these info are collected for every job that is currently under the control of SGE. - This information is then multicast using the gmetric tool of Ganglia so as to be available via gmond daemons. - The above steps are repeated every BATCH_POLL_INTERVAL seconds as indicated in jobmond.conf Note: When the DRMAA issue is sorted out, the job status will then be obtained by establishing a DRMAA session. It would look as below: s=DRMAA.Session() s.init() for jobid in self.qstatparser.attribs: job_status[jobid] = s.getJobProgramStatus(jobid) ##...Code here for translating the DRMA status code ##...into string such as queued active, on hold, etc., ##...and update the status key in the attribs dictionary ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Oscar-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/oscar-devel
