Re: [OMPI devel] Fwd: lsf support / farm use models

2007-07-23 Thread Bill McMillan
Matt,

Sorry for the delay in replying.

>first of all, thanks for the info bill! i think i'm really starting to
piece things together now. you are right in
>that i'm working with a 6.x (6.2 with 6.1 devel libs ;) install here at
cadence, without the HPC extensions AFAIK. 
>also, i think that are customers are mostly in the same position -- i
assume that the HPC extensions cost extra? 
>or perhaps admins just don't bother to install them.

 Since most apps in EDA are sequential, most admins haven't installed
the extensions

>i'll try to gather more data, but my feeling it that the market
penetration of both HPC and LSF 7.0 is low in our 
>marker (EDA vendors and customers). i'd love to just stall until 7.0 is
widely available, but perhaps in the mean 
>time it would be nice to have some backward support for LSF 6.0 'base'.
it seems like supporting LSF 6.x /w HPC 
>might not be too useful, since:
>a) it's not clear that the 'built in' "bsub -n N -a openmpi foo"
>support will work with an MPI-2 dynamic-spawning application like mine
(or does it?),

 From an LSF perspective, you get allocated N slots, and how the
application uses them is pretty much at its
 discretion.   So in this case it would start orted on each allocated
node, and you can create whatever
 dynamic processes you like from your openmpi app within that
allocation.

 At present the actual allocation is fixed, but there will be support
for changing the actual allocation
 in a forthcoming release.

>b) i've heard that manually interfacing with the  parallel application
manager directly is tricky?

 If you don't use the supplied methods (such as the -a openmpi method)
then it can be a little tricky to
 set it up the first time.

>c) most importantly, it's not clear than any of our customers have the
HPC support, and certainly not all of them, 
>so i need to support LSF 6.0 'base' anyway -- it only needs to work
until 7.0 is widely available (< 1 year? i really
>have no idea ... will Platform end support for 6.x at some particular
time? or otherwise push customers to upgrade? perhaps 
>cadence can help there too ...) .

 The -a openmpi method works with LSF 6.x; and will be supported till at
least the end of the decade.  It sounds like
 the simplest solution may be to make the hpc extensions available as a
patch kit for everyone.

>1) use bsub -n N, followed by N-1 ls_rtaske() calls (or similar).
>while ls_rtaske() may not 'force' me to follow the queuing rules, if i
only launch on the proper machines, i should be okay, 
>right? i don't think IO and process marshaling (i'm not sure exactly
what you mean by
>that) are a problem since openmpi/orted handles those issues, i think?

 Yes it will work, however it has two drawbacks:
 * In this model you essentially become responsible for error handling
if a remote task dies, and cleaning up gracefully if
   the master process dies
 * From a process accounting (and hence scheduling) point of view,
resources consumed by the remote tasks are not attributed
   to the master task.
 The -a openmpi method (and blaunch) handles both these cases.

>2) use only bsub's of single processes, using some initial wrapper
script that bsub's all the jobs (master + N-1 slaves) 
>needed to reach the desired static allocation for openmpi. this seems
to be what my internal guy is suggesting is 'required'. 

 Again, this will work, tho you may not be too popular with your cluster
admin if you are holding onto N-1 cpus while waiting
 for the Nth to be allocated.  Method (1) would be viewed as a true
parallel job and could be backfilled, while (2) is just
 a lose collection of sequential tasks.  This would also suffer from the
same drawbacks as (1).

 If your application could start with just 1 cpu then deal with the rest
as they are added, then you keep the cluster admin 
 happy.


 I guess this discussion is becoming very LSF specific, if you would
prefer to discuss it offline please let me know.

 Cheers
 Bill



Re: [OMPI devel] Fwd: lsf support / farm use models

2007-07-18 Thread Matthew Moskewicz

hi,

first of all, thanks for the info bill! i think i'm really starting to
piece things together now. you are right in that i'm working with a
6.x (6.2 with 6.1 devel libs ;) install here at cadence, without the
HPC extensions AFAIK. also, i think that are customers are mostly in
the same position -- i assume that the HPC extensions cost extra? or
perhaps admins just don't bother to install them.

so, there are at least three cases to consider:
LSF 7.0 or greater
LSF 6.x /w HPC
LSF 6.x 'base'

i'll try to gather more data, but my feeling it that the market
penetration of both HPC and LSF 7.0 is low in our marker (EDA vendors
and customers). i'd love to just stall until 7.0 is widely available,
but perhaps in the mean time it would be nice to have some backward
support for LSF 6.0 'base'. it seems like supporting LSF 6.x /w HPC
might not be too useful, since:
a) it's not clear that the 'built in' "bsub -n N -a openmpi foo"
support will work with an MPI-2 dynamic-spawning application like mine
(or does it?),
b) i've heard that manually interfacing with the  parallel application
manager directly is tricky?
c) most importantly, it's not clear than any of our customers have the
HPC support, and certainly not all of them, so i need to support LSF
6.0 'base' anyway -- it only needs to work until 7.0 is widely
available (< 1 year? i really have no idea ... will Platform end
support for 6.x at some particular time? or otherwise push customers
to upgrade? perhaps cadence can help there too ...) .

under LSF 7.0 it looks like things are okay and that open-mpi will
support it in a released version 'soon' (< 6 months? ). sooner than
our customer wil have LSF 7.0 anyway, so that's fine.

as for LSF 6.0 'base', there are two workarounds that i see, and a
couple key questions that remain:

1) use bsub -n N, followed by N-1 ls_rtaske() calls (or similar).
while ls_rtaske() may not 'force' me to follow the queuing rules, if i
only launch on the proper machines, i should be okay, right? i don't
think IO and process marshaling (i'm not sure exactly what you mean by
that) are a problem since openmpi/orted handles those issues, i think?

2) use only bsub's of single processes, using some initial wrapper
script that bsub's all the jobs (master + N-1 slaves) needed to reach
the desired static allocation for openmpi. this seems to be what my
internal guy is suggesting is 'required'. integration with openmpi
might not be too hard, using suitable trickery. for example, the
wrapper script launches some wrapper processes that are basically
rexec daemons. the master waits for them to come up in the ras/lsf
component (tcp notify, perhaps via the launcher machine to avoid
needing to know the master hostname a priori), and then the pls/lsf
component uses the thin rexec daemons to launch orted. seems like a
bit of a silly workaround, but it does seem to both keep the queuing
system happy as well as not need ls_rtaske() or similar.

[ Note: (1) will fail if admins disable the ls_rexec() type of
functionality, but on a LSF 6.0 'base' system, this would seem to
disable all || job launching -- i.e. the shipped mpijob/pvmjob all use
lsgrun and such, so they would be disabled -- is there any other way i
could start the sub-processes within my allocation in that case? can i
just have bsub start N copies of something (maybe orted?)? that seems
like it might be hard to integrate with openmpi, though -- in that
case, i'd probably just only impliment option (2)]

Matt.

On 7/17/07, Bill McMillan  wrote:





> there appear to be some overlaps between the ls_* and lsb_* functions,
> but they seem basically compatible as far as i can tell. almost all
> the functions have a command line version as well, for example:
> lsb_submit()/bsub

  Like openmpi and orte, there are two layers in LSF.  The ls_* API's
  talk to what is/was historically called "LSF Base" and the lsb_* API's
  talk to what is/was historically called "LSF Batch".

[SNIP]

  Regards,
  Bill


-
Bill McMillan
Principal Technical Product Manager
Platform Computing



Re: [OMPI devel] Fwd: lsf support / farm use models

2007-07-15 Thread Ralph Castain
Hi Matt


On 7/15/07 1:49 PM, "Matthew Moskewicz" 
wrote:

> 
>> Welcome! Yes, Jeff and I have been working on the LSF support based on 7.0
>> features in collab with the folks at Platform.
> 
> sounds good. i'm happy to be involved with such a nice active project!
>  
>>> 1) it appears that you (jeff, i guess ;) are using new LSF 7.0 API features.
>>> i'm working to support customers in the EDA space, and it's not clear
>>> if/when
>>> they will migrate to 7.0 -- not to mention that our company (cadence)
>>> doesn't 
>>> appear to have LSF 7.0 yet. i'm still looking in to the deatils, but it
>>> appears that (from the Platform docs) lsb_getalloc is probably just a thin
>>> wrapper around the LSB_MCPU_HOSTS (spelling?) environment variable. so that
>>> could be worked around fairly easily. i dunno about lsb_launch -- it seems
>>> equivalent to a set of ls_rtask() calls (one per process). however, i have
>>> heard that there can be significant subtleties with the semantics of these
>>> functions, in terms of compatibility across differently configured
>>> LSF-controlled farms, specifically with regrads to administrators ability to
>>> track and control job execution. personally, i don't see how it's really
>>> possible for LSF to prevent 'bad' users from spamming out jobs or
>>> short-cutting queues, but perhaps some of the methods they attempt to use
>>> can
>>> complicate things for a library like open-rte.
>> 
>> After lengthy discussions with Platform, it was deemed the best path forward
>> is to use the lsb_getalloc interface. While it currently reads the enviro
>> variable, they indicated a potential change to read a file instead for
>> scalability. Rather than chasing any changes, we all agreed that using
>> lsb_getalloc would remain the "stable" interface - so that is what we used.
> 
> understood.
> 
>> Similar reasons for using lsb_launch. I would really advise against making
>> any changes away from that support. Instead, we could take a lesson from our
>> bproc support and simply (a) detect if we are on a pre-7.0 release, and then
>> (b) build our own internal wrapper that provides back-support. See the bproc
>> pls component for examples.
> 
> that sounds fine -- should just be a matter of a little configure magic,
> right? i already had to change the current configure stuff to be able to build
> at all under 6.2 (since the current configure check requires 7.0 to pass), so
> i guess it shouldn't be too much harder to mimic the bproc method of detecting
> multiple versions, assuming it's really the same sort of thing. basically, i'd
> keep the main LSF configure check downgraded as i have currently done in my
> working copy, but add a new 7.0 check that is really the current truck check.
> 
> then, i'll make signature-compatible replacements (with the same names? or add
> internal functions to abstract things? or just add #ifdef's inline where they
> are used?) for each missing LSF 7.0 function (implemented using the 6.1 or 6.2
> API), and have configure only build them if the system LSF doesn't have them.
> uhm, once i figure out how to do that, anyway ... i guess i'll ask for more
> help if the bproc code doesn't enlighten me. if successful, i should be able
> to track trunk easily with respect to the LSF version issue at least.
> 

This sounds fine - you'll find that the bproc pls does the exact same thing.
In that case, we use #ifdefs since the APIs are actually different between
the versions - we just create a wrapper inside the bproc pls code for the
older version so that we can always call the same API. I'm not sure what the
case will be in LSF - I believe the function calls are indeed different, so
you might be able to use the same approach.

> i'll probably just continue experimenting on my own for the moment (tracking
> any updates to the main trunk LSF support) to see if i can figure it out. any
> advice the best way to get such back support into trunk, if and when if exists
> / is working? 

The *best* way would be for you to sign a third-party agreement - see the
web site for details and a copy. Barring that, the only option would be to
submit the code through either Jeff or I. We greatly prefer the agreement
method as it is (a) less burdensome on us and (b) gives you greater
flexibility.

>
> 
>>> 
>>> 2) this brings us to point 2 -- upon talking to the author(s) of cadence's
>>> internal open-rte-like library, several key issues were raised. mainly,
>>> customers want their applications to be 'farm-friendly' in several key ways.
>>> firstly, they do not want any persistent daemons running outside of a given
>>> job -- this requirement seems met by the current open-mpi default behavior,
>>> at
>>> least as far i can tell. secondly, they prefer (strongly) that applications
>>> acquire resources incrementally, and perform work with whatever nodes are
>>> currently available, rather than forcing a large up-front node allocation.
>>> fault tolerance is