Re: [storage-discuss] ZFS or MPXIO failure after importing new zpool

Jason J. W. Williams Wed, 04 Jun 2008 23:01:54 -0700

There must be a reason for doing backups this way. Perhaps an
explanation from Ethan would help...


-J

On Wed, Jun 4, 2008 at 11:45 PM, James C. McPherson
<[EMAIL PROTECTED]> wrote:
>
> Hi Ethan,
> responses inline below
>
> Ethan Erchinger wrote:
>> Hello,
>> We have a backup strategy that involves mapping LUNs between a given
>> pair of hosts, and copying data from one of the LUNs (src) and another
>> LUN (dest).  The src LUNs sit a SAN device, sometimes multiple devices
>> (zpool mirror).  The src LUN is running a MySQL database and typically
>> will be running for weeks without issue.
>
> I'm sorry, I don't quite understand how this can be a serious
> "backup strategy" - how on earth did you get to thinking that
> it was going to work reliably?
>
>
>> When we start the backup sequence, we map a previously unmapped LUN to
>> the DB host and issue the following commands:
>>
>> root# cfgadm -al
>> (sleep 10)
>> root# luxadm probe
>> (sleep 10)
>> root# zpool import <pool_name>
>
> You're kidding, right? Have you RTFMd the cfgadm_fp(1M) manpage?
> Ever thought about running something similar to
>
>
> # cfgadm -c configure c$X::$target-pwwn
>
>
>> After importing we'll perform some minor IO on the dest LUN, such as
>> adding a symlink, removing some old configuration files.  Then we'll
>> start an ibbackup of that database from the src LUN to the dest LUN, and
>> things go bad.
>
> Frankly, I'm surprised it takes this long for you to get to the
> "things go bad" stage.
>
>
>> It's not completely consistent, but sometimes the DB host will crash,
>> sometimes we'll get chksum/read/write errors on the src LUN.  Looking at
>> dmesg (when the host doesn't crash), we see the LUNs paths all disappear
>> and then reappear usually around 20 seconds later.  Example output
>> below.  Each LUN has 2 paths out of the DB host and 4 paths on each
>> storage device, across two separate SANs.
>
> You're yanking drives in- and out-of-view of your host, you're
> doing so with zpool importing (and exporting?) and yet you still
> want your database to be reliable.
>
>> Usually the host will crash when not running with a zpool mirror, which
>> apparently in Sol10u4, it's expected behavior.
>
> Sorry, but no. What you're doing is creating inconsistencies in the
> host's view of it storage. Don't blame Solaris for this, it's actually
> trying to keep your data consistent.
>
>> These hosts are x86_64 servers, running Sol10u4, unpatched.  They use
>> qlogic qla2342 HBAs, and the stock qlc driver.  They are using MPXIO,
>> from what I can tell.
>
> Yes, they're using MPxIO. You can tell that from the pathnames
> such as /scsi_vhci/[EMAIL PROTECTED] - that's a dead giveaway.
>
> So ... _unpatched_ you say? _Why_ ? I know organisations generally
> have rigorous patching methodologies and schedules, but fer cryin'
> out loud, S10 Update 4 has been available since the middle of 2007.
> That's very nearly 12 months old.
>
>
>> If anyone has any tips on troubleshooting, or knows of things we are
>> doing wrong, help would be appreciated.
>
> Two major recommendations. Firstly, PATCH YOUR SYSTEM.
> Secondly, design a backup methodology which doesn't rely
> on playing the fool with your storage.
>
> Assuming that you're posting from your work email address,
> _surely_ you could convince your management that implementing
> a backup strategy based around an enterprise-class backup
> package such as NetBackup or Networker.
>
> You should also seriously consider getting a professional
> services organisation (such as Sun's) to come in and help
> you get your systems setup properly.
>
>
> James C. McPherson
> --
> Senior Kernel Software Engineer, Solaris
> Sun Microsystems
> http://blogs.sun.com/jmcp       http://www.jmcp.homeunix.com/blog
> _______________________________________________
> storage-discuss mailing list
> [email protected]
> http://mail.opensolaris.org/mailman/listinfo/storage-discuss
>
_______________________________________________
storage-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/storage-discuss

Re: [storage-discuss] ZFS or MPXIO failure after importing new zpool

Reply via email to