Hi Babu:
Good stuff - sounds like you
got a lot going there.
I forwarded the email to
Ramon and Bas at SARA - I am sure they are interested in learning about your
progress.
You mentioned that you were
having problems network booting your nodes - what's the problem specifically -
perhaps we can help... unless it has to do with something at UH
:-)
P.S. There should be a DRMAA
implementation for TORQUE - are there any plans to use that?
Cheers,
Bernard
From: [EMAIL PROTECTED] on behalf of Babu Sundaram
Sent: Sun 25/06/2006 10:25
To: [email protected]
Subject: [Oscar-devel] Mid-term Progress Update for SoC 2006 Project -HPCMetrics in OSCAR
Hi All:
Please find below a mid-term update on my SoC work so far. Let me know if you guys have any comments/suggestions.
Mid-term Progress Update for SoC 2006 - HPCMetrics in OSCAR
================================================
Summary of work accomplished so far:
1, New addons for Ganglia with libe, authd and gexec
2, Modified Ganglia-OSCAR package with gexec support
3, DRMAA-Python OSCAR package
4, Modified implementation of JobMonarch to facilitate integration of SGE via DRMAA
The latest code and the SRPMs and binary RPMs (for FC4-i386 and FC5-i386) are available at OSCAR repository under
.../oscar-soc/soc-2006/hpcmetrics
Note: The JobMonarch code is not on the SVN yet.
Weekly tasks:
Week 1: May 24th - May 31st
- Completed Ganglia compilation with gexec
- Building libe, authd, gexec
There were some problems getting the correct versions of the above
that work correctly with latest Ganglia 3.0.x
- Identified the correct versions of the components above for building
- Wrote correct spec files for libe(0.3.0), authd(0.2.2) and gexec(0.3.6)
- Sucessfully built the RPMs and SRPMs on FC4-i386
- There were some portions with gexec implementation that were using old Ganglia 2.x
Week 2: Jun 1st - 7th
- Implemented patches to gexec-0.3.6 so it built correctly with Ganglia 3.0.x
Modified the paths to header files
Added the requirement for ganglia-devel and libe >= 0.3.0
Added the linking to expat library
- Created the updated spec file for gexec
- Got SVN access to OSCAR repository; Created hpcmetrics dir for the SoC code
- Completed a test bed setup in UH using FC5 on i386 with OSCAR 5.0 from trunk
- Rebuilt all the RPMs for FC5
Week 3: Jun 8th - 15th
- Made changes to Ganglia's spec file - to allow gexec support
--enable-gexec as part of configure phase in ganglia build
- Tested the modified Ganglia package on OSCAR cluster on Master node
- Brushed up on my Python knowledge to start work with JobMonarch
- Read up on DRMAA, obtained some familiarity with DRMAA-Python implementation
Week 4: Jun 15th -22nd
- Built DRMAA Python on FC5-i386 with SGE's C bindings as the DRM
- Created DRMAA python spec file for building RPMs; Requires DRMAA
- Modified SGE-OSCAR package spec so it provides DRMAA that is required by DRMAA-Python
- Created RPMs, SRPM for DRMAA-Python-0.2
- Preliminary tests to monitor SGE jobs via DRMAA API
- OSCAR Package for DRMAA-Python
- Renamed authd RPMS to gexec-authd to avoid conflict with RFC 1413 identd daemon (Also called authd)
Otherwise, the identd daemon RPM was installed instead of authd prior to gexec
Week 5: Jun 23rd - today
- Changes to JobMonarch implementation were requested from Ramon
An 'if' test is added to check whether to use pbs interface or DRMAA's
- Support was added to express the Interface needed as part of Monarch's config file
- BATCH_API option; When set to DRMAA it will use the Python binding (onto SGE's C binding)
- some unexpected delays this week
A few servers were compromised by external access at my department in Univ of Houston
DRMAA API issues - Could submit jobs to OSCAR-SGE; But wait() call on SGE jobs fails due to ValueError
Need to clarify with SGE developers
=============================
*** Some issues currently ***
- Having some trouble in network booting the client nodes in OSCAR cluster
So testing of client side install of gexec and DRMAA remains; Hopefully should be resolved this coming week
- Cannot access the testbed within the Computer Science @ UH due to complete rebuild of systems; Should be ready sometime this week
=============================
Plan for the next 3 weeks:
Week 6: Jun 26th - Jul 3rd
DRMAA-Monarch integration completion
Refine the gexec and authd package and test them
- need to add a few additions to post_install scripts
for RSA key setup via authd on master node and copy it
out to client nodes for transparent gexec
Weeks 7 & 8: Jul 4th - 19th
Obtain sensord from Erich and start working on it for adding timing control support
Identifying mechanisms to collect DRM job breakdown, network and disk statistics
Integrate these tools with sensord (either as separate programs or as addons to Monarch)
- needs to be discussed with Erich
JobMonarch package for OSCAR
And the last 4 weeks would be spent for extensions to Ganglia interface (3 weeks) and
documentation of the work in Summer
Regards,
Babu
Note: My access to [EMAIL PROTECTED] is temporarily unavailable. Once we get our access to the servers restored, I will post the weekly updates on my webpage once a week so you all can take a look when you get the time. Thanks.
Please find below a mid-term update on my SoC work so far. Let me know if you guys have any comments/suggestions.
Mid-term Progress Update for SoC 2006 - HPCMetrics in OSCAR
================================================
Summary of work accomplished so far:
1, New addons for Ganglia with libe, authd and gexec
2, Modified Ganglia-OSCAR package with gexec support
3, DRMAA-Python OSCAR package
4, Modified implementation of JobMonarch to facilitate integration of SGE via DRMAA
The latest code and the SRPMs and binary RPMs (for FC4-i386 and FC5-i386) are available at OSCAR repository under
.../oscar-soc/soc-2006/hpcmetrics
Note: The JobMonarch code is not on the SVN yet.
Weekly tasks:
Week 1: May 24th - May 31st
- Completed Ganglia compilation with gexec
- Building libe, authd, gexec
There were some problems getting the correct versions of the above
that work correctly with latest Ganglia 3.0.x
- Identified the correct versions of the components above for building
- Wrote correct spec files for libe(0.3.0), authd(0.2.2) and gexec(0.3.6)
- Sucessfully built the RPMs and SRPMs on FC4-i386
- There were some portions with gexec implementation that were using old Ganglia 2.x
Week 2: Jun 1st - 7th
- Implemented patches to gexec-0.3.6 so it built correctly with Ganglia 3.0.x
Modified the paths to header files
Added the requirement for ganglia-devel and libe >= 0.3.0
Added the linking to expat library
- Created the updated spec file for gexec
- Got SVN access to OSCAR repository; Created hpcmetrics dir for the SoC code
- Completed a test bed setup in UH using FC5 on i386 with OSCAR 5.0 from trunk
- Rebuilt all the RPMs for FC5
Week 3: Jun 8th - 15th
- Made changes to Ganglia's spec file - to allow gexec support
--enable-gexec as part of configure phase in ganglia build
- Tested the modified Ganglia package on OSCAR cluster on Master node
- Brushed up on my Python knowledge to start work with JobMonarch
- Read up on DRMAA, obtained some familiarity with DRMAA-Python implementation
Week 4: Jun 15th -22nd
- Built DRMAA Python on FC5-i386 with SGE's C bindings as the DRM
- Created DRMAA python spec file for building RPMs; Requires DRMAA
- Modified SGE-OSCAR package spec so it provides DRMAA that is required by DRMAA-Python
- Created RPMs, SRPM for DRMAA-Python-0.2
- Preliminary tests to monitor SGE jobs via DRMAA API
- OSCAR Package for DRMAA-Python
- Renamed authd RPMS to gexec-authd to avoid conflict with RFC 1413 identd daemon (Also called authd)
Otherwise, the identd daemon RPM was installed instead of authd prior to gexec
Week 5: Jun 23rd - today
- Changes to JobMonarch implementation were requested from Ramon
An 'if' test is added to check whether to use pbs interface or DRMAA's
- Support was added to express the Interface needed as part of Monarch's config file
- BATCH_API option; When set to DRMAA it will use the Python binding (onto SGE's C binding)
- some unexpected delays this week
A few servers were compromised by external access at my department in Univ of Houston
DRMAA API issues - Could submit jobs to OSCAR-SGE; But wait() call on SGE jobs fails due to ValueError
Need to clarify with SGE developers
=============================
*** Some issues currently ***
- Having some trouble in network booting the client nodes in OSCAR cluster
So testing of client side install of gexec and DRMAA remains; Hopefully should be resolved this coming week
- Cannot access the testbed within the Computer Science @ UH due to complete rebuild of systems; Should be ready sometime this week
=============================
Plan for the next 3 weeks:
Week 6: Jun 26th - Jul 3rd
DRMAA-Monarch integration completion
Refine the gexec and authd package and test them
- need to add a few additions to post_install scripts
for RSA key setup via authd on master node and copy it
out to client nodes for transparent gexec
Weeks 7 & 8: Jul 4th - 19th
Obtain sensord from Erich and start working on it for adding timing control support
Identifying mechanisms to collect DRM job breakdown, network and disk statistics
Integrate these tools with sensord (either as separate programs or as addons to Monarch)
- needs to be discussed with Erich
JobMonarch package for OSCAR
And the last 4 weeks would be spent for extensions to Ganglia interface (3 weeks) and
documentation of the work in Summer
Regards,
Babu
Note: My access to [EMAIL PROTECTED] is temporarily unavailable. Once we get our access to the servers restored, I will post the weekly updates on my webpage once a week so you all can take a look when you get the time. Thanks.
Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________ Oscar-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/oscar-devel
