Ian Molton wrote: > > It might make sense to have the reconnect logic in the egd chardev > > backend then, thereby obsoleting the socket reconnect patch. > > Im not sure I agree there... surely there are other things which would > benefit from generic socket reconnection support (virtio-rng cant be the > only driver that might want to rely on a reliable source of data via a > socket in a server-farm type situation?)
First of all: Why are your egd daemons with open connections dying anyway? Buggy egd? Secondly: why isn't egd death an event reported over QMP, with a monitor command to reconnect manually? If guests need a _reliable_ source of data for security, silently not complaining when it's gone away and hoping it comes back isn't good enough. It should be an error condition known to management, which can halt the guest until egd is fixed or restarts if running without entropy isn't acceptable in its policy. Thirdly, which other things do you think would use it? Maybe some virtio-serial apps would like it. But then it would need to sync with the guest on reconnection, so that the guest can restart whatever protocol it's using over the byte stream. In which case, it's better to tell the guest that the connection died, and give the guest a way to request a new one when it's ready. Reconnecting and resuming in the middle of the byte stram would be bad (even for egd protocol?). Pure /dev/urandom fetching is quite unusual in not caring about this, but you shouldn't need to reconnect to that. > Do we really want to re-implement reconnection (and reconnection retry > anti-flood limiting) in every single backend? I don't think it'll happen. I think egd is a rather unusual If another backend ever needs it, it's easy to move code around. I'm not convinced there's a need for it even for egd. Either egd shouldn't be killing open connections (and is buggy if it is), or this is normal egd behavior and so it's part of the egd protocol to repeatedly reconnect, and therefore can go in the egd client code. Meanwhile, because the egd might not return, it should be reported as an error condition over QMP for management to do what it deems appropriate. In which case, management could tell it to reconnect when it thinks is a good time, or do other things like switch the randomness source to something else, or stop the guest, or warn the admin that a guest is running without entropy. -- Jamie