I would consider a node loaded from a provisioning module's standpoint as the point when the bits are on the node's disk and it has been powered on. After this point, the OS module is responsible. xCAT detecting the boot state would be equivalent to successfully turning on the VM.

There is a bit of a dilemma though. xCAT's makesshgkh should be run after sshd is up on the node to scan its keys or else a key mismatch error is displayed in SSH output. I don't know of an elegant way around this. I like the idea of a provisioning module only having to worry about putting the bits on the disk and starting the node and then control gets passed back to new.pm and then handed off to the OS module. This may limit flexibility though if the provisioning module has to do things before and after the OS module.

One solution would be to have the provisioning module call $os->post_load() instead of it being called from new.pm. This would match how image capture is done (capture() calls $os->pre_capture()). The reasoning was because more control needed to be given to the provisioning module for special situations. The downside is that it imposes an additional requirement when creating a provisioning module. I'm leaning this way at the moment.



Aaron Peeler wrote:
Sounds good.
On the vm modules, we'd need to decide on when a node is considered loaded. With xcat there is a state we can check, install,image, boot, etc. With vm we just start the vm, a couple of ideas are, maybe we wait some time period or until it's pingable, or something else then return it's loaded.


--On January 19, 2010 1:16:11 PM -0500 Andy Kurth <andy_ku...@ncsu.edu> wrote:

I'd like to propose a design change for the modularized backend code.
The provisioning modules (xCAT.pm, vmware.pm, etc) are currently
responsible for monitoring and waiting for the computer's OS to respond
after an image has been loaded.  It would be better if this task were
handled by the OS modules because the sequence of things to monitor and
the appropriate timeouts vary widely among OS's.

This will solve a problem currently affecting Windows Server 2008, Vista,
and most likely 2007.  Sysprep's mini-setup phase takes a horrendously
long time with the newer OS's compared to XP and 2003.  This is causing
timeouts to be reached before Sysprep is done.  There is currently no way
to specify longer timeouts for the newer versions of Windows without
having it apply to all OS's.   Having the OS module monitor and wait for
the computer to respond would solve this problem.

To accomplish this, the waiting/monitoring responsibility would be moved
to the post_load() subroutine in the OS module and new.pm will keep track
of the install attempt count, passing it to load() and post_load().  The
sequence is:
1. new.pm calls $provisioner->load($install_attempt)
2. new.pm calls $os->post_load($install_attempt)

The return value sent back to the new.pm module by the provisioning
module's load() subroutine would be:
1: computer is done being loaded and ready for OS post_load()
0: error occurred, attempt image load again
undefined: error occurred, don't attempt load again

The return value sent back to new.pm from the OS module's post_load()
would be:
1: computer OS is configured and ready for a reservation
0: error occurred, attempt image load again
undefined: error occurred, don't attempt load again

Having new.pm keep track of the attempt count and pass it to the load()
and post_load() subroutines allows them to be able to use this value to
adjust their timeouts and return values if appropriate.

This will also remove the SSH dependency from the provisioning modules.
There was a thread a few months ago about supporting methods other than
SSH to control computers.  This will facilitate that feature.

I created VCL-291 and will begin to work on this.  Please reply if you
have any thoughts or suggestions.


Aaron Peeler
OIT Advanced Computing
College of Engineering-NCSU

Andy Kurth
Virtual Computing Lab
Office of Information Technology
North Carolina State University

Reply via email to