Hi All,
I've been working on/thinking about a bug filed a while ago related to libvirt 
resize/cold migrations.  The bug ended up being roughly as such:

On a Packstack install, cold migrations and resizes fail under the default 
setup with an error about not being able to do an SSH `mkdir` operation.
The case ended up being that Nova was failing to do the resize because the 
individual compute nodes didn't have passwordless (key-based) ssh permissions
into the other compute nodes.

The proposed temporary fix was to manually give the compute nodes SSH 
permissions into each other, with the moderate-term
fix being to have Packstack distribute SSH keys among the compute nodes and set 
up permissions.

While these fixes work, they left me with a certain dirty taste in my mouth, 
since it doesn't seem quite elegant to have Nova SSH-ing around
between compute nodes, and the upstream community seemed to agree with this 
(there was a thread a while ago, but I got sidetracked with other
work).  Upon further investigation, I found four points at which the Nova 
libvirt driver uses SSH, all of which revolve around the method
`migrate_disk_and_power_off` (the main part of the resize/cold migration code):

1. to detect shared storage
2. to create the directory for the instance on the destination system
3. to copy the disk image from the source to the destination system (uses 
either rysnc over ssh or scp)
4. to remove the directory created in (2) in case of an error during the process

Number 1 can be trivially eliminated by using the existing 
'_is_instance_storage_shared' method in the RPCAPI from the compute manager, 
and passing that value to the driver (with the other drivers
most likely ignoring it) instead of checking from within the driver code.  
Numbers 2 and 4 can be eliminated by using a "pre_x, x, cleanup_x" flow, 
similarly to how live migrations are handled (with
"pre_x" and "cleanup_x" being run on the destination machines via the RPCAPI).  
That only leaves number 3.  Note that these are only used when we are going 
between machines without shared storage.
Shared storage eliminates cases 2-4.

So here's my question: can number 3 be "elminated", so to speak?  Having to 
give full SSH permissions for a file copy seems a bit overkill (we could, for 
example, run an rsync daemon, in which case
rsync would connect via the daemon and not ssh).  Is it worth it?  
Additionally, if we do not eliminate number 3, is it worth it to refactor the 
code to eliminate numbers 2 and 4 (I already have code
to eliminate number 1 -- see https://gist.github.com/DirectXMan12/9217699).

Best Regards,
Solly Ross

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to