On Nov 22, 2005, at 9:45 AM, Lombard, David N wrote:

1. First is due to the nature of the disk boot model.  There's no
low
level way to know why a node doesn't boot.  If one doesn't come up
then putting a monitor on it is required.  I don't know what you
would
do about it.  It's pretty low level.  Finding a third-party solution
and bundling it in would be nice if possible.

Some sort of remote console access is needed.  Could be as simple as a
serial line with BIOS console redirect, one of those KVM-over-IP
thingies, or SOL (Serial Over LAN, an IPMI 2.0 feature).  At the end of
the day, hardware must provide a part of the answer.

I agree. But are there 10% solutions that OSCAR can offer? I think that's the real question here. Is there any kind of feedback -- however rudimentary -- that the node can provide to give some kind of indication about why it failed? Perhaps even the following messages:

1. I failed, but if you reboot me, it might work
2. I failed, but if you reboot me, it should work
3. I failed, and have no idea why. You need to attach a monitor and find out.

Something simple like that -- even if the predominant answer we can give is #3, that could be helpful.

2. The problem that can be resolved more easily is the post-install
configuration.  Oscar is really good at the initial install but try
doing something like building the latest kernel from source and
distributing it to the nodes.  It's not easy.  We have a lot of
things
that need to be installed from source.  What I do is chroot to the
image, make the changes and do a cpushimage.  It works but it's not
oscar friendly.  It would be nice if I could do something more in
the
GUI.  I understand that updating the image is going to be text and
manual but I don't believe the gui lets me push the updated image.
Keep in mind that I don't want to reformat the disk every time.  I
may
just want to push a single updated binary out.

I don't know that this is "not oscar friendly" as we provide the tools,
just not the gui.

Yes, I think that's what he meant. Very definitely a user perspective here; he doesn't know/care how OSCAR is implemented internally (although he does use c3 and the other tools that OSCAR provides).

But, should be eminently doable from that spiffy new
portal--well, it is directly doable from the current portal's C3 tools
page, but a more purpose-built variant to push/get an image would be
useful.  We need the localboot/install magic for PXEBOOT manageable
there, too.

I think that's the goal here -- it would be nice if some kind of gui (even a web-based thingy) could do some of these common tasks easily/trivially. Perhaps a shiny button "re-push image X out to the relevant nodes."

3. Also, ganglia is a nice monitor but it would be nice to be able
to
do things like reboot a node from the gui.  I know it can be done at
the command line.

I don't see this as a ganglia feature--as a portal feature, this would
be fine as would the above item.

Agreed. I think this stems from the user perspective of "I can see all this stuff in Ganglia, but I can't *do* anything to/with it -- it's just reporting. But it seems like a natural place to let me *do* things as well."

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
Register for a JBoss Training Course.  Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
Oscar-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-devel

Reply via email to