From: [EMAIL PROTECTED] on behalf of Erich Focht
Sent: Mon 26/06/2006 03:40
To: [email protected]
Subject: Re: [Oscar-devel] Mid-term Progress Update for SoC 2006 Project -HPCMetrics in OSCAR
Hi Babu,
thanks for the detailed report. Progress looks
good and I looking foward to
the first JobMonarch checkins.
To
Bernard's question: Babu once explained that Torque's DRMAA support is
not
complete. Once it will be, JobMonarch can switch over to DRMAA usage
for
Torque, too.
Best regards,
Erich
On Sunday 25 June
2006 19:25, Babu Sundaram wrote:
> Hi All:
>
> Please find
below a mid-term update on my SoC work so far. Let me know if
> you guys
have any comments/suggestions.
>
> Mid-term Progress Update for SoC
2006 - HPCMetrics in OSCAR
>
================================================
> Summary of work
accomplished so far:
>
> 1, New addons for Ganglia with libe, authd
and gexec
> 2, Modified Ganglia-OSCAR package with gexec support
>
3, DRMAA-Python OSCAR package
> 4, Modified implementation of JobMonarch
to facilitate integration of SGE
> via DRMAA
>
> The latest
code and the SRPMs and binary RPMs (for FC4-i386 and FC5-i386)
> are
available at OSCAR repository under
>
.../oscar-soc/soc-2006/hpcmetrics
>
> Note: The JobMonarch code is
not on the SVN yet.
>
> Weekly tasks:
>
> Week 1: May
24th - May 31st
> - Completed Ganglia compilation with gexec
> -
Building libe, authd, gexec
> There were some problems getting
the correct versions of the above
> that work correctly with
latest Ganglia 3.0.x
> - Identified the correct versions of the components
above for building
> - Wrote correct spec files for libe(0.3.0),
authd(0.2.2) and gexec(0.3.6)
> - Sucessfully built the RPMs and SRPMs on
FC4-i386
> - There were some portions with gexec implementation that were
using old
> Ganglia 2.x
>
> Week 2: Jun 1st - 7th
> -
Implemented patches to gexec-0.3.6 so it built correctly with Ganglia
>
3.0.x
> Modified the paths to header files
>
Added the requirement for ganglia-devel and libe >= 0.3.0
>
Added the linking to expat library
> - Created the updated spec file for
gexec
> - Got SVN access to OSCAR repository; Created hpcmetrics dir for
the SoC
> code
> - Completed a test bed setup in UH using FC5 on
i386 with OSCAR 5.0 from
> trunk
> - Rebuilt all the RPMs for
FC5
>
> Week 3: Jun 8th - 15th
> - Made changes to Ganglia's
spec file - to allow gexec support
>
--enable-gexec as part of configure phase in ganglia build
> - Tested the
modified Ganglia package on OSCAR cluster on Master node
> - Brushed up on
my Python knowledge to start work with JobMonarch
> - Read up on DRMAA,
obtained some familiarity with DRMAA-Python
>
implementation
>
> Week 4: Jun 15th -22nd
> - Built DRMAA
Python on FC5-i386 with SGE's C bindings as the DRM
> - Created DRMAA
python spec file for building RPMs; Requires DRMAA
> - Modified SGE-OSCAR
package spec so it provides DRMAA that is required by
>
DRMAA-Python
> - Created RPMs, SRPM for DRMAA-Python-0.2
> -
Preliminary tests to monitor SGE jobs via DRMAA API
> - OSCAR Package for
DRMAA-Python
> - Renamed authd RPMS to gexec-authd to avoid conflict with
RFC 1413 identd
> daemon (Also called authd)
>
Otherwise, the identd daemon RPM was installed instead of authd prior to
>
gexec
>
> Week 5: Jun 23rd - today
> - Changes to JobMonarch
implementation were requested from Ramon
> An 'if' test is
added to check whether to use pbs interface or DRMAA's
> - Support was
added to express the Interface needed as part of Monarch's
> config
file
> - BATCH_API option; When set to DRMAA it will use the
Python binding (onto
> SGE's C binding)
> - some unexpected delays
this week
> A few servers were compromised by external access
at my department in Univ
> of Houston
> DRMAA API issues
- Could submit jobs to OSCAR-SGE; But wait() call on SGE
> jobs fails due
to ValueError
> Need to
clarify with SGE developers
>
>
=============================
> *** Some issues currently ***
> -
Having some trouble in network booting the client nodes in OSCAR
cluster
> So testing of client side install of gexec and DRMAA
remains; Hopefully
> should be resolved this coming week
> - Cannot
access the testbed within the Computer Science @ UH due to complete
>
rebuild of systems; Should be ready sometime this week
>
=============================
>
> Plan for the next 3 weeks:
>
Week 6: Jun 26th - Jul 3rd
> DRMAA-Monarch integration completion
>
Refine the gexec and authd package and test them
> - need to add a
few additions to post_install scripts
> for RSA key
setup via authd on master node and copy it
> out to
client nodes for transparent gexec
>
> Weeks 7 & 8: Jul
4th - 19th
> Obtain sensord from Erich and start working on it for
adding timing control
> support
> Identifying mechanisms to collect
DRM job breakdown, network and disk
> statistics
> Integrate these
tools with sensord (either as separate programs or as addons
> to
Monarch)
> - needs to be discussed with Erich
>
JobMonarch package for OSCAR
>
> And the last 4 weeks would be spent
for extensions to Ganglia interface (3
> weeks) and
> documentation
of the work in Summer
>
> Regards,
> Babu
>
>
Note: My access to [EMAIL PROTECTED] is temporarily unavailable. Once we
get
> our access to the servers restored, I will post the weekly updates
on my
> webpage once a week so you all can take a look when you get the
time.
> Thanks.
Using Tomcat but need to do more? Need to
support web services, security?
Get stuff done quickly with pre-integrated
technology to make your job easier
Download IBM WebSphere Application Server
v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Oscar-devel
mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-devel
Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________ Oscar-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/oscar-devel
