I agree. In a perfect world, services would be distributed with Xen in VM containers across separate physical machines to ensure high availability, calculations would take place using code that intelligently checkpointed (as well as distributed the work appropriately over a cluster to remove single points of failure), and so forth. I'm assuming that if your application takes weeks to complete a computing run, you're checkpointing often and offloading that data somewhere so if a box dies or needs to be rebooted, you're not losing weeks of computation time already spent.
On Wed, Jun 17, 2009 at 12:30 AM, Dr Andrew C Aitchison < [email protected]> wrote: > > On Mon, 15 Jun 2009, Brandon Galbraith wrote: > > Or run your services/calculations in a VM on Xen that you can snapshot, >> upgrade the host, and then bring the VMs back up. There are some things >> you >> just can't get around (like reboots for core components). >> > > That is worth considering, though managing the memory allocation between > VMs might cause more of a hassle than the gain, especially on the machine > for calculations which need lots of memory (currently 16GB). > > Thanks, > > -- > Dr. Andrew C. Aitchison Computer Officer, DPMMS, Cambridge > [email protected] > http://www.dpmms.cam.ac.uk/~werdna<http://www.dpmms.cam.ac.uk/%7Ewerdna> > -- Brandon Galbraith Mobile: 630.400.6992 FNAL: 630.840.2141
