On 08/12/12 05:48, Jordi Espasa Clofent wrote:
Hi all,

The most of my production machines looks like:

* Solaris 10 Update 7 (now we're starting to migrate to Update 10)
* All the FS in classical UFS but /opt with ZFS
* All the zones inside /opt/zones
* All the zones containing a app server (Glassfish)
* All the critical app data is just managed/stored by a backend bbdd, so no data inside the zone but the app (Java files) itself

In case I have some real and weird problem in the zone (or even in the global one) we proceed:

- re-create the zone in another server (all the process is automated by backend scripts, it takes just 15/20 minutes)
Zone cloning can take this down to a few seconds, especially if /opt/zones is a separate zfs file system from /opt. This will make it so that each zone gets its own file system (aka dataset). This is the key to making zone cloning go quickly. Once you have that setup, you can create a master zone that is configured as far as it makes sense in your situation. That may or may not include the glassfish installation and/or the J2EE application(s) being run by glassfish. Then, when you need to create a new zone (or recreate another one that is damaged):

zoneadm -z badzone uninstall -F
zoneadm -z badzone clone master
cp /mumble/badzone.sysidcfg /opt/zones/badzone/root/etc/sysidcfg
chmod 400 /opt/zones/badzone/root/etc/sysidcfg
zoneadm -z badzone boot

Depending on your environment, there may be other first-boot zone-specific setup that is required. You could extend this scheme so that each zone has a custom master that is never subjected to your workload. If something catastrophic happens to the zone that has the workload, you can always uninstall it then re-install from its master.

To make it so that this is easy to do on a recovery system, you can make a copy of your master zone on some other system.

primary# zoneadm -z master detach
primary# zp=`zoneadm list -H -o name /opt/zones/master`
primary# zfs snapshot $zp@replicate
primary# zfs send -p $zp@replicate | ssh backup zfs recv $zp
primary# zoneadm -z master attach

backup# zoneadm -z master create -a /opt/zones/master
backup# zoneadm -z master attach

Depending on your situation, you may or may not want to repeat this with your other zones. Whenever you patch, install new packages in the global zone, or otherwise significantly change the content in a zone that was previously copied, you probably want to redo the copy.

This works best if the packages and patches between primary and backup are exactly in sync. That allows you to move back to primary as well.

- re-deploy the app in the zone (30/45 minutes)

So, in total, if zoneX is crashing, to be able to be up un running again is gonna take around 45min-1h05 aprox. That's acceptable for us, but obviously it would be amazing if we can reduce this needed time.

I'm just wondering if I can do it playing with some ZFS capability (as snapshots) since all the zones are always under /opt/zones, which, as I said, is ZFS.

Thanks in advance for all the suggestions.
_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org

I assume that you are planning for horrible what-if scenarios. If you are actually experiencing somewhat frequent issues that lead to you needing to rebuild the zones from scratch, you should really figure out the root cause. It could be that you have more serious problems than just the occasional misbehaving zone.

--
Mike Gerdts
Solaris Core OS / Zones                 http://blogs.oracle.com/zoneszone/

_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org

Reply via email to