There must be a reason for doing backups this way. Perhaps an explanation from Ethan would help...
-J On Wed, Jun 4, 2008 at 11:45 PM, James C. McPherson <[EMAIL PROTECTED]> wrote: > > Hi Ethan, > responses inline below > > Ethan Erchinger wrote: >> Hello, >> We have a backup strategy that involves mapping LUNs between a given >> pair of hosts, and copying data from one of the LUNs (src) and another >> LUN (dest). The src LUNs sit a SAN device, sometimes multiple devices >> (zpool mirror). The src LUN is running a MySQL database and typically >> will be running for weeks without issue. > > I'm sorry, I don't quite understand how this can be a serious > "backup strategy" - how on earth did you get to thinking that > it was going to work reliably? > > >> When we start the backup sequence, we map a previously unmapped LUN to >> the DB host and issue the following commands: >> >> root# cfgadm -al >> (sleep 10) >> root# luxadm probe >> (sleep 10) >> root# zpool import <pool_name> > > You're kidding, right? Have you RTFMd the cfgadm_fp(1M) manpage? > Ever thought about running something similar to > > > # cfgadm -c configure c$X::$target-pwwn > > >> After importing we'll perform some minor IO on the dest LUN, such as >> adding a symlink, removing some old configuration files. Then we'll >> start an ibbackup of that database from the src LUN to the dest LUN, and >> things go bad. > > Frankly, I'm surprised it takes this long for you to get to the > "things go bad" stage. > > >> It's not completely consistent, but sometimes the DB host will crash, >> sometimes we'll get chksum/read/write errors on the src LUN. Looking at >> dmesg (when the host doesn't crash), we see the LUNs paths all disappear >> and then reappear usually around 20 seconds later. Example output >> below. Each LUN has 2 paths out of the DB host and 4 paths on each >> storage device, across two separate SANs. > > You're yanking drives in- and out-of-view of your host, you're > doing so with zpool importing (and exporting?) and yet you still > want your database to be reliable. > >> Usually the host will crash when not running with a zpool mirror, which >> apparently in Sol10u4, it's expected behavior. > > Sorry, but no. What you're doing is creating inconsistencies in the > host's view of it storage. Don't blame Solaris for this, it's actually > trying to keep your data consistent. > >> These hosts are x86_64 servers, running Sol10u4, unpatched. They use >> qlogic qla2342 HBAs, and the stock qlc driver. They are using MPXIO, >> from what I can tell. > > Yes, they're using MPxIO. You can tell that from the pathnames > such as /scsi_vhci/[EMAIL PROTECTED] - that's a dead giveaway. > > So ... _unpatched_ you say? _Why_ ? I know organisations generally > have rigorous patching methodologies and schedules, but fer cryin' > out loud, S10 Update 4 has been available since the middle of 2007. > That's very nearly 12 months old. > > >> If anyone has any tips on troubleshooting, or knows of things we are >> doing wrong, help would be appreciated. > > Two major recommendations. Firstly, PATCH YOUR SYSTEM. > Secondly, design a backup methodology which doesn't rely > on playing the fool with your storage. > > Assuming that you're posting from your work email address, > _surely_ you could convince your management that implementing > a backup strategy based around an enterprise-class backup > package such as NetBackup or Networker. > > You should also seriously consider getting a professional > services organisation (such as Sun's) to come in and help > you get your systems setup properly. > > > James C. McPherson > -- > Senior Kernel Software Engineer, Solaris > Sun Microsystems > http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog > _______________________________________________ > storage-discuss mailing list > [email protected] > http://mail.opensolaris.org/mailman/listinfo/storage-discuss > _______________________________________________ storage-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/storage-discuss
