Re: Move response checking from provisioning module to OS module

2010-02-03 Thread Andy Kurth
This is an update on VCL-291.  I have made the changes to all of the relevant 
files except for esxthin.pm and committed them.  I didn't want to touch 
esxthin.pm in case Brian is working on it.  The changes are currently being 
tested/used in NCSU's production implementation and I have not seen any issues 
so far.  Having the OS modules check for SSH after an image is loaded is working 
better.  This is done by OS.pm::wait_for_response().  Version_6.pm also 
implements its own wait_for_response() which accounts for longer times needed by 
Vista/2008.


The subroutine suggested by Sean to perform multiple attempts of running code 
has been implemented as Module.pm::code_loop_timeout().  It takes arguments of: 
a code reference, arguments to be passed to the code, message string to be 
displayed in the log output, total seconds to attempt to run the code, seconds 
between attempts.  It runs the code reference until it returns true or the 
timeout is reached.  If the code reference returns false, it waits the specified 
number of seconds and tries again.  This is used by wait_for_ssh(), 
wait_for_ping(), and wait_for_no_ping() in OS.pm.


Brian - the changes for esxthin.pm are minor.  In load(), remove the SSH 
checking section and replace it with a call to $os-post_load().  This code can 
be copied from one of the other provisioning modules.  I can go ahead and do 
this but I don't have a way of testing it.


-Andy

Andy Kurth wrote:
I think having the provisioning modules call post_load() will allow for 
the greatest flexibility.  Since it shouldn't be much of an imposition 
and matches how capture() works, I'll work on coding the following changes:


-call to $os-post_load() removed from new.pm
-call to $os-post_load() added to each provisioning module's load() sub
-code removed in provisioning modules which waits for computer to respond
-code which waits for the computer to respond added to each OS module's 
post_load() sub


After thinking about it, I don't see the need for new.pm to do any 
looping.  It should call the provisioning module's load() sub once and 
check the return value.  The provisioning modules can implement repeated 
attempts as necessary within the load() sub.


It is useful the $os-post_load() to know if it's being run for the 
first time or a repeat attempt.  It could increase its timeouts if it's 
a repeated attempt.  I will have the provisioning modules pass it the 
attempt number.


A looping subroutine would be useful in many cases.  Good idea.  It may 
be better to create such a sub in Module.pm so that all types of modules 
have access to it.  I'm thinking of creating a sub which you pass a code 
reference and some timeout parameters.  It attempts to run the code 
until it returns true up until the timeout is reached.


Thanks,
Andy


Sean Dilda wrote:
I don't think having $provisioning-load() call $os-post_load() would 
be such an imposition.   However, if you want to have multiple 
attempts at $os-post_load() as was mentioned in an earlier email, 
it'd be nice to have a function defined in the OS module to do that 
looping.


I haven't used the xCAT module.  What does makesshgkh do?



--
Andy Kurth
Virtual Computing Lab
Office of Information Technology
North Carolina State University
andy_ku...@ncsu.edu
919.513.4090


Re: Move response checking from provisioning module to OS module

2010-01-20 Thread Sean Dilda

Aaron Peeler wrote:
makesshgkh is part of xcat1.3(which is EOL'd) and is used to collect the 
ssh host keys after the install.


xCAT2.X does something different to collect the ssh host keys, so 
eventually makesshgkh and the original xCAT.pm module will not be needed.





Ok, that makes sense.

Based on this, I assume you have different ssh keys on different images. 
 If so, how do you keep your users who regularly check out linux images 
from having key conflicts?  In order to work around this, we're ensuring 
all of our images have the same ssh keys.  Thus, no need to collect the 
key after imaging, and our users don't have to worry about key conflicts.


Re: Move response checking from provisioning module to OS module

2010-01-19 Thread Josh Thompson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Sounds like a good idea to me.

Josh

On Tuesday January 19, 2010, Andy Kurth wrote:
 I'd like to propose a design change for the modularized backend code.  The
 provisioning modules (xCAT.pm, vmware.pm, etc) are currently responsible
 for monitoring and waiting for the computer's OS to respond after an image
 has been loaded.  It would be better if this task were handled by the OS
 modules because the sequence of things to monitor and the appropriate
 timeouts vary widely among OS's.

 This will solve a problem currently affecting Windows Server 2008, Vista,
 and most likely 2007.  Sysprep's mini-setup phase takes a horrendously long
 time with the newer OS's compared to XP and 2003.  This is causing timeouts
 to be reached before Sysprep is done.  There is currently no way to specify
 longer timeouts for the newer versions of Windows without having it apply
 to all OS's. Having the OS module monitor and wait for the computer to
 respond would solve this problem.

 To accomplish this, the waiting/monitoring responsibility would be moved to
 the post_load() subroutine in the OS module and new.pm will keep track of
 the install attempt count, passing it to load() and post_load().  The
 sequence is: 1. new.pm calls $provisioner-load($install_attempt)
 2. new.pm calls $os-post_load($install_attempt)

 The return value sent back to the new.pm module by the provisioning
 module's load() subroutine would be:
 1: computer is done being loaded and ready for OS post_load()
 0: error occurred, attempt image load again
 undefined: error occurred, don't attempt load again

 The return value sent back to new.pm from the OS module's post_load() would
 be: 1: computer OS is configured and ready for a reservation
 0: error occurred, attempt image load again
 undefined: error occurred, don't attempt load again

 Having new.pm keep track of the attempt count and pass it to the load() and
 post_load() subroutines allows them to be able to use this value to adjust
 their timeouts and return values if appropriate.

 This will also remove the SSH dependency from the provisioning modules. 
 There was a thread a few months ago about supporting methods other than SSH
 to control computers.  This will facilitate that feature.

 I created VCL-291 and will begin to work on this.  Please reply if you have
 any thoughts or suggestions.

 Thanks,
 Andy
- -- 
- ---
Josh Thompson
Systems Programmer
Advanced Computing | VCL Developer
North Carolina State University

josh_thomp...@ncsu.edu
919-515-5323

my GPG/PGP key can be found at pgp.mit.edu
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFLVfiyV/LQcNdtPQMRAqVuAJ9inngb+CNv8VsWLj9yJ/2GJBlu2QCfT5Qz
/uKWNBNsRHOiubWr1bRGry4=
=nwRt
-END PGP SIGNATURE-


Re: Move response checking from provisioning module to OS module

2010-01-19 Thread Andy Kurth
I would consider a node loaded from a provisioning module's standpoint as the 
point when the bits are on the node's disk and it has been powered on.  After 
this point, the OS module is responsible.  xCAT detecting the boot state would 
be equivalent to successfully turning on the VM.


There is a bit of a dilemma though.  xCAT's makesshgkh should be run after sshd 
is up on the node to scan its keys or else a key mismatch error is displayed in 
SSH output.  I don't know of an elegant way around this.  I like the idea of a 
provisioning module only having to worry about putting the bits on the disk and 
starting the node and then control gets passed back to new.pm and then handed 
off to the OS module.  This may limit flexibility though if the provisioning 
module has to do things before and after the OS module.


One solution would be to have the provisioning module call $os-post_load() 
instead of it being called from new.pm.  This would match how image capture is 
done (capture() calls $os-pre_capture()).  The reasoning was because more 
control needed to be given to the provisioning module for special situations. 
The downside is that it imposes an additional requirement when creating a 
provisioning module.  I'm leaning this way at the moment.


Thoughts?

Thanks,
Andy

Aaron Peeler wrote:

Sounds good.
On the vm modules, we'd need to decide on when a node is considered 
loaded. With xcat there is a state we can check, install,image, boot, 
etc. With vm we just start the vm, a couple of ideas are, maybe we wait 
some time period or until it's pingable, or something else then return 
it's loaded.


Aaron

--On January 19, 2010 1:16:11 PM -0500 Andy Kurth andy_ku...@ncsu.edu 
wrote:



I'd like to propose a design change for the modularized backend code.
The provisioning modules (xCAT.pm, vmware.pm, etc) are currently
responsible for monitoring and waiting for the computer's OS to respond
after an image has been loaded.  It would be better if this task were
handled by the OS modules because the sequence of things to monitor and
the appropriate timeouts vary widely among OS's.

This will solve a problem currently affecting Windows Server 2008, Vista,
and most likely 2007.  Sysprep's mini-setup phase takes a horrendously
long time with the newer OS's compared to XP and 2003.  This is causing
timeouts to be reached before Sysprep is done.  There is currently no way
to specify longer timeouts for the newer versions of Windows without
having it apply to all OS's.   Having the OS module monitor and wait for
the computer to respond would solve this problem.

To accomplish this, the waiting/monitoring responsibility would be moved
to the post_load() subroutine in the OS module and new.pm will keep track
of the install attempt count, passing it to load() and post_load().  The
sequence is:
1. new.pm calls $provisioner-load($install_attempt)
2. new.pm calls $os-post_load($install_attempt)

The return value sent back to the new.pm module by the provisioning
module's load() subroutine would be:
1: computer is done being loaded and ready for OS post_load()
0: error occurred, attempt image load again
undefined: error occurred, don't attempt load again

The return value sent back to new.pm from the OS module's post_load()
would be:
1: computer OS is configured and ready for a reservation
0: error occurred, attempt image load again
undefined: error occurred, don't attempt load again

Having new.pm keep track of the attempt count and pass it to the load()
and post_load() subroutines allows them to be able to use this value to
adjust their timeouts and return values if appropriate.

This will also remove the SSH dependency from the provisioning modules.
There was a thread a few months ago about supporting methods other than
SSH to control computers.  This will facilitate that feature.

I created VCL-291 and will begin to work on this.  Please reply if you
have any thoughts or suggestions.

Thanks,
Andy




Aaron Peeler
OIT Advanced Computing
College of Engineering-NCSU
919.513.4571
http://vcl.ncsu.edu


--
Andy Kurth
Virtual Computing Lab
Office of Information Technology
North Carolina State University
andy_ku...@ncsu.edu
919.513.4090


Re: Move response checking from provisioning module to OS module

2010-01-19 Thread Aaron Peeler


Another option might be to have a $provision_module-post_load() routine. 
Once the new.pm module detects the node is loaded and accessible, it could 
call $provision_module-post_load() before moving on to the 
$os-post_load().


Aaron

--On January 19, 2010 4:16:16 PM -0500 Andy Kurth andy_ku...@ncsu.edu 
wrote:



I would consider a node loaded from a provisioning module's standpoint as
the point when the bits are on the node's disk and it has been powered
on.  After this point, the OS module is responsible.  xCAT detecting the
boot state would be equivalent to successfully turning on the VM.

There is a bit of a dilemma though.  xCAT's makesshgkh should be run
after sshd is up on the node to scan its keys or else a key mismatch
error is displayed in SSH output.  I don't know of an elegant way around
this.  I like the idea of a provisioning module only having to worry
about putting the bits on the disk and starting the node and then control
gets passed back to new.pm and then handed off to the OS module.  This
may limit flexibility though if the provisioning module has to do things
before and after the OS module.

One solution would be to have the provisioning module call
$os-post_load() instead of it being called from new.pm.  This would
match how image capture is done (capture() calls $os-pre_capture()).
The reasoning was because more control needed to be given to the
provisioning module for special situations. The downside is that it
imposes an additional requirement when creating a provisioning module.
I'm leaning this way at the moment.

Thoughts?

Thanks,
Andy

Aaron Peeler wrote:

Sounds good.
On the vm modules, we'd need to decide on when a node is considered
loaded. With xcat there is a state we can check, install,image, boot,
etc. With vm we just start the vm, a couple of ideas are, maybe we wait
some time period or until it's pingable, or something else then return
it's loaded.

Aaron

--On January 19, 2010 1:16:11 PM -0500 Andy Kurth andy_ku...@ncsu.edu
wrote:


I'd like to propose a design change for the modularized backend code.
The provisioning modules (xCAT.pm, vmware.pm, etc) are currently
responsible for monitoring and waiting for the computer's OS to respond
after an image has been loaded.  It would be better if this task were
handled by the OS modules because the sequence of things to monitor and
the appropriate timeouts vary widely among OS's.

This will solve a problem currently affecting Windows Server 2008,
Vista, and most likely 2007.  Sysprep's mini-setup phase takes a
horrendously long time with the newer OS's compared to XP and 2003.
This is causing timeouts to be reached before Sysprep is done.  There
is currently no way to specify longer timeouts for the newer versions
of Windows without having it apply to all OS's.   Having the OS module
monitor and wait for the computer to respond would solve this problem.

To accomplish this, the waiting/monitoring responsibility would be moved
to the post_load() subroutine in the OS module and new.pm will keep
track of the install attempt count, passing it to load() and
post_load().  The sequence is:
1. new.pm calls $provisioner-load($install_attempt)
2. new.pm calls $os-post_load($install_attempt)

The return value sent back to the new.pm module by the provisioning
module's load() subroutine would be:
1: computer is done being loaded and ready for OS post_load()
0: error occurred, attempt image load again
undefined: error occurred, don't attempt load again

The return value sent back to new.pm from the OS module's post_load()
would be:
1: computer OS is configured and ready for a reservation
0: error occurred, attempt image load again
undefined: error occurred, don't attempt load again

Having new.pm keep track of the attempt count and pass it to the load()
and post_load() subroutines allows them to be able to use this value to
adjust their timeouts and return values if appropriate.

This will also remove the SSH dependency from the provisioning modules.
There was a thread a few months ago about supporting methods other than
SSH to control computers.  This will facilitate that feature.

I created VCL-291 and will begin to work on this.  Please reply if you
have any thoughts or suggestions.

Thanks,
Andy




Aaron Peeler
OIT Advanced Computing
College of Engineering-NCSU
919.513.4571
http://vcl.ncsu.edu


--
Andy Kurth
Virtual Computing Lab
Office of Information Technology
North Carolina State University
andy_ku...@ncsu.edu
919.513.4090




Aaron Peeler
OIT Advanced Computing
College of Engineering-NCSU
919.513.4571
http://vcl.ncsu.edu


Re: Move response checking from provisioning module to OS module

2010-01-19 Thread Sean Dilda

Andy Kurth wrote:
I would consider a node loaded from a provisioning module's standpoint as the 
point when the bits are on the node's disk and it has been powered on.  After 
this point, the OS module is responsible.  xCAT detecting the boot state would 
be equivalent to successfully turning on the VM.


There is a bit of a dilemma though.  xCAT's makesshgkh should be run after sshd 
is up on the node to scan its keys or else a key mismatch error is displayed in 
SSH output.  I don't know of an elegant way around this.  I like the idea of a 
provisioning module only having to worry about putting the bits on the disk and 
starting the node and then control gets passed back to new.pm and then handed 
off to the OS module.  This may limit flexibility though if the provisioning 
module has to do things before and after the OS module.


One solution would be to have the provisioning module call $os-post_load() 
instead of it being called from new.pm.  This would match how image capture is 
done (capture() calls $os-pre_capture()).  The reasoning was because more 
control needed to be given to the provisioning module for special situations. 
The downside is that it imposes an additional requirement when creating a 
provisioning module.  I'm leaning this way at the moment.


Thoughts?



I don't think having $provisioning-load() call $os-post_load() would 
be such an imposition.   However, if you want to have multiple attempts 
at $os-post_load() as was mentioned in an earlier email, it'd be nice 
to have a function defined in the OS module to do that looping.


I haven't used the xCAT module.  What does makesshgkh do?