Re: Move response checking from provisioning module to OS module
This is an update on VCL-291. I have made the changes to all of the relevant files except for esxthin.pm and committed them. I didn't want to touch esxthin.pm in case Brian is working on it. The changes are currently being tested/used in NCSU's production implementation and I have not seen any issues so far. Having the OS modules check for SSH after an image is loaded is working better. This is done by OS.pm::wait_for_response(). Version_6.pm also implements its own wait_for_response() which accounts for longer times needed by Vista/2008. The subroutine suggested by Sean to perform multiple attempts of running code has been implemented as Module.pm::code_loop_timeout(). It takes arguments of: a code reference, arguments to be passed to the code, message string to be displayed in the log output, total seconds to attempt to run the code, seconds between attempts. It runs the code reference until it returns true or the timeout is reached. If the code reference returns false, it waits the specified number of seconds and tries again. This is used by wait_for_ssh(), wait_for_ping(), and wait_for_no_ping() in OS.pm. Brian - the changes for esxthin.pm are minor. In load(), remove the SSH checking section and replace it with a call to $os-post_load(). This code can be copied from one of the other provisioning modules. I can go ahead and do this but I don't have a way of testing it. -Andy Andy Kurth wrote: I think having the provisioning modules call post_load() will allow for the greatest flexibility. Since it shouldn't be much of an imposition and matches how capture() works, I'll work on coding the following changes: -call to $os-post_load() removed from new.pm -call to $os-post_load() added to each provisioning module's load() sub -code removed in provisioning modules which waits for computer to respond -code which waits for the computer to respond added to each OS module's post_load() sub After thinking about it, I don't see the need for new.pm to do any looping. It should call the provisioning module's load() sub once and check the return value. The provisioning modules can implement repeated attempts as necessary within the load() sub. It is useful the $os-post_load() to know if it's being run for the first time or a repeat attempt. It could increase its timeouts if it's a repeated attempt. I will have the provisioning modules pass it the attempt number. A looping subroutine would be useful in many cases. Good idea. It may be better to create such a sub in Module.pm so that all types of modules have access to it. I'm thinking of creating a sub which you pass a code reference and some timeout parameters. It attempts to run the code until it returns true up until the timeout is reached. Thanks, Andy Sean Dilda wrote: I don't think having $provisioning-load() call $os-post_load() would be such an imposition. However, if you want to have multiple attempts at $os-post_load() as was mentioned in an earlier email, it'd be nice to have a function defined in the OS module to do that looping. I haven't used the xCAT module. What does makesshgkh do? -- Andy Kurth Virtual Computing Lab Office of Information Technology North Carolina State University andy_ku...@ncsu.edu 919.513.4090
Re: Move response checking from provisioning module to OS module
Aaron Peeler wrote: makesshgkh is part of xcat1.3(which is EOL'd) and is used to collect the ssh host keys after the install. xCAT2.X does something different to collect the ssh host keys, so eventually makesshgkh and the original xCAT.pm module will not be needed. Ok, that makes sense. Based on this, I assume you have different ssh keys on different images. If so, how do you keep your users who regularly check out linux images from having key conflicts? In order to work around this, we're ensuring all of our images have the same ssh keys. Thus, no need to collect the key after imaging, and our users don't have to worry about key conflicts.
Re: Move response checking from provisioning module to OS module
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sounds like a good idea to me. Josh On Tuesday January 19, 2010, Andy Kurth wrote: I'd like to propose a design change for the modularized backend code. The provisioning modules (xCAT.pm, vmware.pm, etc) are currently responsible for monitoring and waiting for the computer's OS to respond after an image has been loaded. It would be better if this task were handled by the OS modules because the sequence of things to monitor and the appropriate timeouts vary widely among OS's. This will solve a problem currently affecting Windows Server 2008, Vista, and most likely 2007. Sysprep's mini-setup phase takes a horrendously long time with the newer OS's compared to XP and 2003. This is causing timeouts to be reached before Sysprep is done. There is currently no way to specify longer timeouts for the newer versions of Windows without having it apply to all OS's. Having the OS module monitor and wait for the computer to respond would solve this problem. To accomplish this, the waiting/monitoring responsibility would be moved to the post_load() subroutine in the OS module and new.pm will keep track of the install attempt count, passing it to load() and post_load(). The sequence is: 1. new.pm calls $provisioner-load($install_attempt) 2. new.pm calls $os-post_load($install_attempt) The return value sent back to the new.pm module by the provisioning module's load() subroutine would be: 1: computer is done being loaded and ready for OS post_load() 0: error occurred, attempt image load again undefined: error occurred, don't attempt load again The return value sent back to new.pm from the OS module's post_load() would be: 1: computer OS is configured and ready for a reservation 0: error occurred, attempt image load again undefined: error occurred, don't attempt load again Having new.pm keep track of the attempt count and pass it to the load() and post_load() subroutines allows them to be able to use this value to adjust their timeouts and return values if appropriate. This will also remove the SSH dependency from the provisioning modules. There was a thread a few months ago about supporting methods other than SSH to control computers. This will facilitate that feature. I created VCL-291 and will begin to work on this. Please reply if you have any thoughts or suggestions. Thanks, Andy - -- - --- Josh Thompson Systems Programmer Advanced Computing | VCL Developer North Carolina State University josh_thomp...@ncsu.edu 919-515-5323 my GPG/PGP key can be found at pgp.mit.edu -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFLVfiyV/LQcNdtPQMRAqVuAJ9inngb+CNv8VsWLj9yJ/2GJBlu2QCfT5Qz /uKWNBNsRHOiubWr1bRGry4= =nwRt -END PGP SIGNATURE-
Re: Move response checking from provisioning module to OS module
I would consider a node loaded from a provisioning module's standpoint as the point when the bits are on the node's disk and it has been powered on. After this point, the OS module is responsible. xCAT detecting the boot state would be equivalent to successfully turning on the VM. There is a bit of a dilemma though. xCAT's makesshgkh should be run after sshd is up on the node to scan its keys or else a key mismatch error is displayed in SSH output. I don't know of an elegant way around this. I like the idea of a provisioning module only having to worry about putting the bits on the disk and starting the node and then control gets passed back to new.pm and then handed off to the OS module. This may limit flexibility though if the provisioning module has to do things before and after the OS module. One solution would be to have the provisioning module call $os-post_load() instead of it being called from new.pm. This would match how image capture is done (capture() calls $os-pre_capture()). The reasoning was because more control needed to be given to the provisioning module for special situations. The downside is that it imposes an additional requirement when creating a provisioning module. I'm leaning this way at the moment. Thoughts? Thanks, Andy Aaron Peeler wrote: Sounds good. On the vm modules, we'd need to decide on when a node is considered loaded. With xcat there is a state we can check, install,image, boot, etc. With vm we just start the vm, a couple of ideas are, maybe we wait some time period or until it's pingable, or something else then return it's loaded. Aaron --On January 19, 2010 1:16:11 PM -0500 Andy Kurth andy_ku...@ncsu.edu wrote: I'd like to propose a design change for the modularized backend code. The provisioning modules (xCAT.pm, vmware.pm, etc) are currently responsible for monitoring and waiting for the computer's OS to respond after an image has been loaded. It would be better if this task were handled by the OS modules because the sequence of things to monitor and the appropriate timeouts vary widely among OS's. This will solve a problem currently affecting Windows Server 2008, Vista, and most likely 2007. Sysprep's mini-setup phase takes a horrendously long time with the newer OS's compared to XP and 2003. This is causing timeouts to be reached before Sysprep is done. There is currently no way to specify longer timeouts for the newer versions of Windows without having it apply to all OS's. Having the OS module monitor and wait for the computer to respond would solve this problem. To accomplish this, the waiting/monitoring responsibility would be moved to the post_load() subroutine in the OS module and new.pm will keep track of the install attempt count, passing it to load() and post_load(). The sequence is: 1. new.pm calls $provisioner-load($install_attempt) 2. new.pm calls $os-post_load($install_attempt) The return value sent back to the new.pm module by the provisioning module's load() subroutine would be: 1: computer is done being loaded and ready for OS post_load() 0: error occurred, attempt image load again undefined: error occurred, don't attempt load again The return value sent back to new.pm from the OS module's post_load() would be: 1: computer OS is configured and ready for a reservation 0: error occurred, attempt image load again undefined: error occurred, don't attempt load again Having new.pm keep track of the attempt count and pass it to the load() and post_load() subroutines allows them to be able to use this value to adjust their timeouts and return values if appropriate. This will also remove the SSH dependency from the provisioning modules. There was a thread a few months ago about supporting methods other than SSH to control computers. This will facilitate that feature. I created VCL-291 and will begin to work on this. Please reply if you have any thoughts or suggestions. Thanks, Andy Aaron Peeler OIT Advanced Computing College of Engineering-NCSU 919.513.4571 http://vcl.ncsu.edu -- Andy Kurth Virtual Computing Lab Office of Information Technology North Carolina State University andy_ku...@ncsu.edu 919.513.4090
Re: Move response checking from provisioning module to OS module
Another option might be to have a $provision_module-post_load() routine. Once the new.pm module detects the node is loaded and accessible, it could call $provision_module-post_load() before moving on to the $os-post_load(). Aaron --On January 19, 2010 4:16:16 PM -0500 Andy Kurth andy_ku...@ncsu.edu wrote: I would consider a node loaded from a provisioning module's standpoint as the point when the bits are on the node's disk and it has been powered on. After this point, the OS module is responsible. xCAT detecting the boot state would be equivalent to successfully turning on the VM. There is a bit of a dilemma though. xCAT's makesshgkh should be run after sshd is up on the node to scan its keys or else a key mismatch error is displayed in SSH output. I don't know of an elegant way around this. I like the idea of a provisioning module only having to worry about putting the bits on the disk and starting the node and then control gets passed back to new.pm and then handed off to the OS module. This may limit flexibility though if the provisioning module has to do things before and after the OS module. One solution would be to have the provisioning module call $os-post_load() instead of it being called from new.pm. This would match how image capture is done (capture() calls $os-pre_capture()). The reasoning was because more control needed to be given to the provisioning module for special situations. The downside is that it imposes an additional requirement when creating a provisioning module. I'm leaning this way at the moment. Thoughts? Thanks, Andy Aaron Peeler wrote: Sounds good. On the vm modules, we'd need to decide on when a node is considered loaded. With xcat there is a state we can check, install,image, boot, etc. With vm we just start the vm, a couple of ideas are, maybe we wait some time period or until it's pingable, or something else then return it's loaded. Aaron --On January 19, 2010 1:16:11 PM -0500 Andy Kurth andy_ku...@ncsu.edu wrote: I'd like to propose a design change for the modularized backend code. The provisioning modules (xCAT.pm, vmware.pm, etc) are currently responsible for monitoring and waiting for the computer's OS to respond after an image has been loaded. It would be better if this task were handled by the OS modules because the sequence of things to monitor and the appropriate timeouts vary widely among OS's. This will solve a problem currently affecting Windows Server 2008, Vista, and most likely 2007. Sysprep's mini-setup phase takes a horrendously long time with the newer OS's compared to XP and 2003. This is causing timeouts to be reached before Sysprep is done. There is currently no way to specify longer timeouts for the newer versions of Windows without having it apply to all OS's. Having the OS module monitor and wait for the computer to respond would solve this problem. To accomplish this, the waiting/monitoring responsibility would be moved to the post_load() subroutine in the OS module and new.pm will keep track of the install attempt count, passing it to load() and post_load(). The sequence is: 1. new.pm calls $provisioner-load($install_attempt) 2. new.pm calls $os-post_load($install_attempt) The return value sent back to the new.pm module by the provisioning module's load() subroutine would be: 1: computer is done being loaded and ready for OS post_load() 0: error occurred, attempt image load again undefined: error occurred, don't attempt load again The return value sent back to new.pm from the OS module's post_load() would be: 1: computer OS is configured and ready for a reservation 0: error occurred, attempt image load again undefined: error occurred, don't attempt load again Having new.pm keep track of the attempt count and pass it to the load() and post_load() subroutines allows them to be able to use this value to adjust their timeouts and return values if appropriate. This will also remove the SSH dependency from the provisioning modules. There was a thread a few months ago about supporting methods other than SSH to control computers. This will facilitate that feature. I created VCL-291 and will begin to work on this. Please reply if you have any thoughts or suggestions. Thanks, Andy Aaron Peeler OIT Advanced Computing College of Engineering-NCSU 919.513.4571 http://vcl.ncsu.edu -- Andy Kurth Virtual Computing Lab Office of Information Technology North Carolina State University andy_ku...@ncsu.edu 919.513.4090 Aaron Peeler OIT Advanced Computing College of Engineering-NCSU 919.513.4571 http://vcl.ncsu.edu
Re: Move response checking from provisioning module to OS module
Andy Kurth wrote: I would consider a node loaded from a provisioning module's standpoint as the point when the bits are on the node's disk and it has been powered on. After this point, the OS module is responsible. xCAT detecting the boot state would be equivalent to successfully turning on the VM. There is a bit of a dilemma though. xCAT's makesshgkh should be run after sshd is up on the node to scan its keys or else a key mismatch error is displayed in SSH output. I don't know of an elegant way around this. I like the idea of a provisioning module only having to worry about putting the bits on the disk and starting the node and then control gets passed back to new.pm and then handed off to the OS module. This may limit flexibility though if the provisioning module has to do things before and after the OS module. One solution would be to have the provisioning module call $os-post_load() instead of it being called from new.pm. This would match how image capture is done (capture() calls $os-pre_capture()). The reasoning was because more control needed to be given to the provisioning module for special situations. The downside is that it imposes an additional requirement when creating a provisioning module. I'm leaning this way at the moment. Thoughts? I don't think having $provisioning-load() call $os-post_load() would be such an imposition. However, if you want to have multiple attempts at $os-post_load() as was mentioned in an earlier email, it'd be nice to have a function defined in the OS module to do that looping. I haven't used the xCAT module. What does makesshgkh do?