Re: [zfs-discuss] x4500 vs AVS ?
Jorgen Lundman wrote: If we were interested in finding a method to replicate data to a 2nd x4500, what other options are there for us? If you already have an X4500, I think the best option for you is a cron job with incremental 'zfs send'. Or rsync. -- Ralf Ramge Senior Solaris Administrator, SCNA, SCSA Tel. +49-721-91374-3963 [EMAIL PROTECTED] - http://web.de/ 11 Internet AG Brauerstraße 48 76135 Karlsruhe Amtsgericht Montabaur HRB 6484 Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Thomas Gottschlich, Matthias Greve, Robert Hoffmann, Markus Huhn, Oliver Mauss, Achim Weiss Aufsichtsratsvorsitzender: Michael Scheeren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] A few questions
Am I right in thinking though that for every raidz1/2 vdev, you're effectively losing the storage of one/two disks in that vdev? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] A few questions
On Wed, Sep 17, 2008 at 8:40 AM, gm_sjo [EMAIL PROTECTED] wrote: Am I right in thinking though that for every raidz1/2 vdev, you're effectively losing the storage of one/two disks in that vdev? Well yeah - you've got to have some allowance for redundancy. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] A few questions
2008/9/17 Peter Tribble: On Wed, Sep 17, 2008 at 8:40 AM, gm_sjo [EMAIL PROTECTED] wrote: Am I right in thinking though that for every raidz1/2 vdev, you're effectively losing the storage of one/two disks in that vdev? Well yeah - you've got to have some allowance for redundancy. This is what i'm struggling to get my head around - the chances of losing two disks at the same time are pretty darn remote (within a reasonable time-to-replace delta), so what advantage is there (other than potentially pointless uber-redundancy) in running multiple raidz/2 vdevs? Are you not infact losing performance by reducing the amount of spindles used for a given pool? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool with multiple mirrors question
If 2 disks of a mirror fail do the pool will be faulted ? NAMESTATE READ WRITE CKSUM homez ONLINE 0 0 0 mirrorONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 mirrorONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 mirrorONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 Thanks. -- Francois ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool with multiple mirrors question
Francois wrote: If 2 disks of a mirror fail do the pool will be faulted ? NAMESTATE READ WRITE CKSUM homez ONLINE 0 0 0 mirrorONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 mirrorONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 mirrorONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 If c0t6d0 and c0t7d0 both fail (ie both sides of the same mirror vdev) then the pool will be unable to retrieve all the data stored in it. If c0t6d0 and c0t3d0 both fail then there are sufficient replicas of data available in that case because it was disks from different mirrors. This applies to SVM as well if you have a stripe of mirrors with a UFS filesystem ontop of that you will have the same availability issue. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] A few questions
On Wed, Sep 17, 2008 at 10:11 AM, gm_sjo [EMAIL PROTECTED] wrote: 2008/9/17 Peter Tribble: On Wed, Sep 17, 2008 at 8:40 AM, gm_sjo [EMAIL PROTECTED] wrote: Am I right in thinking though that for every raidz1/2 vdev, you're effectively losing the storage of one/two disks in that vdev? Well yeah - you've got to have some allowance for redundancy. This is what i'm struggling to get my head around - the chances of losing two disks at the same time are pretty darn remote (within a reasonable time-to-replace delta), so what advantage is there (other than potentially pointless uber-redundancy) in running multiple raidz/2 vdevs? Are you not infact losing performance by reducing the amount of spindles used for a given pool? No. The number of spindles is constant. The snag is that for random reads, the performance of a raidz1/2 vdev is essentially that of a single disk. (The writes are fast because they're always full-stripe; but so are the reads.) So your effective random read performance is that of a single disk times the number of raidz vdevs. It's a tradeoff, as in all things. Fewer vdevs means less wasted space, but lower performance. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] A few questions
gm_sjo wrote: Are you not infact losing performance by reducing the amount of spindles used for a given pool? This depends. Usually, RAIDZ1/2 isn't a good performancer when it comes to random access read I/O, for instance. If I wanted to scale performance by adding spindles, I would use mirrors (RAID 10). If you want to scale filesystem sizes, RAIDZ is your friend. I once had the problem that I needed a high random I/O performance and at least a 11 TB large filesystem on a X4500. Mirroring was out of the question (not enough disk space left), and RAIDZ gave me only about 25% of the performance of the existing Linux ext2 boxes I had to compete with. But in the end, striping 13 RAIDZ sets of 3 drives each + 1 hot spare delivered acceptable results in both categories. But it took me a lot of benchmarks to get there. -- Ralf Ramge Senior Solaris Administrator, SCNA, SCSA Tel. +49-721-91374-3963 [EMAIL PROTECTED] - http://web.de/ 11 Internet AG Brauerstraße 48 76135 Karlsruhe Amtsgericht Montabaur HRB 6484 Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Thomas Gottschlich, Matthias Greve, Robert Hoffmann, Markus Huhn, Oliver Mauss, Achim Weiss Aufsichtsratsvorsitzender: Michael Scheeren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool with multiple mirrors question
Darren J Moffat wrote: If c0t6d0 and c0t7d0 both fail (ie both sides of the same mirror vdev) then the pool will be unable to retrieve all the data stored in it. If c0t6d0 and c0t3d0 both fail then there are sufficient replicas of data available in that case because it was disks from different mirrors. This applies to SVM as well if you have a stripe of mirrors with a UFS filesystem ontop of that you will have the same availability issue. Thanks for precision :) -- Francois ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] iscsi target problems on snv_97
I believe the problem you're seeing might be related to deadlock condition (CR 6745310), if you run pstack on the iscsi target daemon you might find a bunch of zombie threads. The fix is putback to snv-99, give snv-99 a try. Yes, a pstack of the core I've generated from iscsitgtd does have a number of zombie threads. I'm afraid I can't make heads nor tails of the bug report at http://bugs.opensolaris.org/view_bug.do?bug_id=6658836 nor its duplicate-of 6745310, nor any of the related bugs (all are unavailable except for 6676298, and the stack trace reported in that bug doesn't look anything like mine. As far as I can tell snv-98 is the latest build, from Sep 10 according to http://dlc.sun.com/osol/on/downloads/. So snv-99 should be out next week, correct? Anything I can do in the mean time? Do I need to BFU to the latest nightly build? Or would just taking the iscsitgtd from that build suffice? --Joe ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] iscsi target problems on snv_97
Moore, Joe wrote: I believe the problem you're seeing might be related to deadlock condition (CR 6745310), if you run pstack on the iscsi target daemon you might find a bunch of zombie threads. The fix is putback to snv-99, give snv-99 a try. Yes, a pstack of the core I've generated from iscsitgtd does have a number of zombie threads. I'm afraid I can't make heads nor tails of the bug report at http://bugs.opensolaris.org/view_bug.do?bug_id=6658836 nor its duplicate-of 6745310, nor any of the related bugs (all are unavailable except for 6676298, and the stack trace reported in that bug doesn't look anything like mine. As far as I can tell snv-98 is the latest build, from Sep 10 according to http://dlc.sun.com/osol/on/downloads/. So snv-99 should be out next week, correct? snv-99 should be out next week. Anything I can do in the mean time? Do I need to BFU to the latest nightly build? Or would just taking the iscsitgtd from that build suffice? You could try snv-98. You don't need to bfu, just get the latest iscsitgtd. -Tim --Joe ___ storage-discuss mailing list [EMAIL PROTECTED] http://mail.opensolaris.org/mailman/listinfo/storage-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool with multiple mirrors question
djm == Darren J Moffat [EMAIL PROTECTED] writes: djm If c0t6d0 and c0t7d0 both fail (ie both sides of the same djm mirror vdev) then the pool will be unable to retrieve all the djm data stored in it. won't be able to retrieve ANY of the data stored on it. It's correct as you wrote it, but you almost make it sound like you could get some data off the pool, and one might reasonably hope to, but you can't. for xample, 1. zpool create pool mirror disk1 disk2 2. pax -rwpe /somewhere/else /pool 3. zpool add pool mirror disk3 disk4 [don't write anything to the pool] 4. [disk3 and disk4 both die] You've now lost everything you copied onto the pool in step 2. so, if you type 'zpool add pool disk3 disk4' and forget the 'mirror', your mistake isn't such a small one. You have to quickly find disk5 and disk6 to attach. (happened to me. with 30day-old disk3/disk4. at home with no backup.) pgpvd284d4vke.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] resilver keeps starting over? snv_95
Running Nevada build 95 on an ultra 40. Had to replace a drive. Resilver in progress, but it looks like each time I do a zpool status, the resilver starts over. Is this a known issue? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver keeps starting over? snv_95
On 17 September, 2008 - Neal Pollack sent me these 0,3K bytes: Running Nevada build 95 on an ultra 40. Had to replace a drive. Resilver in progress, but it looks like each time I do a zpool status, the resilver starts over. Is this a known issue? I recall some issue with 'zpool status' as root restarting resilvering.. Doing it as a regular user will not.. /Tomas -- Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver keeps starting over? snv_95
t == Tomas Ögren [EMAIL PROTECTED] writes: t I recall some issue with 'zpool status' as root restarting t resilvering.. Doing it as a regular user will not.. is there an mdb command similar to zpool status? maybe it's safer. pgp8jYtCisPzr.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS system requirements
Cyril Plisko wrote: On Wed, Sep 17, 2008 at 6:06 AM, Erik Trimble [EMAIL PROTECTED] wrote: Just one more things on this: Run with a 64-bit processor. Don't even think of using a 32-bit one - there are known issues with ZFS not quite properly using 32-bit only structures. That is, ZFS is really 64-bit clean, but not 32-bit clean. Wow ! That's a statement. Can you provide more info on these 32-bit issues ? I am not aware of any. In fact besides being sluggish (presumably due to limited address space) I never noticed any issues with ZFS, which I used on 32-bit machine for 2 years. http://www.opensolaris.org/jive/thread.jspa?messageID=212508#212508 Looking through the Bug database, it seems that a good chunk of 32-bit-related problems have been resolved. However, there hasn't been a general fix for the overall issue noted in the above discussion. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver keeps starting over? snv_95
Are you doing snaps? If so unless you have the new bits to handle the issue, each snap restarts a scrub or resilver. Thanks! Wade Stuart we are fallon P: 612.758.2660 C: 612.877.0385 ** Fallon has moved. Effective May 19, 2008 our address is 901 Marquette Ave, Suite 2400, Minneapolis, MN 55402. [EMAIL PROTECTED] wrote on 09/17/2008 01:07:53 PM: Running Nevada build 95 on an ultra 40. Had to replace a drive. Resilver in progress, but it looks like each time I do a zpool status, the resilver starts over. Is this a known issue? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver keeps starting over? snv_95
On 09/17/08 02:29 PM, [EMAIL PROTECTED] wrote: Are you doing snaps? No, no snapshots ever. Logged in as root to do; zpool replace poolname deaddisk and then did a few zpool status as root. It restarted each time. If so unless you have the new bits to handle the issue, each snap restarts a scrub or resilver. Thanks! Wade Stuart we are fallon P: 612.758.2660 C: 612.877.0385 ** Fallon has moved. Effective May 19, 2008 our address is 901 Marquette Ave, Suite 2400, Minneapolis, MN 55402. [EMAIL PROTECTED] wrote on 09/17/2008 01:07:53 PM: Running Nevada build 95 on an ultra 40. Had to replace a drive. Resilver in progress, but it looks like each time I do a zpool status, the resilver starts over. Is this a known issue? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZPOOL Import Problem
On Sep 16, 2008, at 5:39 PM, Miles Nordin wrote: jd == Jim Dunham [EMAIL PROTECTED] writes: jd If at the time the SNDR replica is deleted the set was jd actively replicating, along with ZFS actively writing to the jd ZFS storage pool, I/O consistency will be lost, leaving ZFS jd storage pool in an indeterministic state on the remote node. jd To address this issue, prior to deleting the replicas, the jd replica should be placed into logging mode first. What if you stop the replication by breaking the network connection between primary and replica? consistent or inconsistent? Consistent. it sounds fishy, like ``we're always-consistent-on-disk with ZFS, but please use 'zpool offline' to avoid disastrous pool corruption.'' This is not the case at all. Maintaining I/O consistency of all volumes in a single I/O consistency group, is an attribute of replication. The instant an SNDR replica is deleted, that volume is no longer being replicated, and it becomes inconsistent with all other write-order volumes. By placing all volumes in the I/O consistency group in logging mode, not 'zpool offline', and then deleting the replica there is no means for any of the remote volumes to become I/O inconsistent. Yes, one will note that there is a group disable command sndradm -g group-name -d, but it was implemented for easy of administration, not for performing a write-order coordinated disable command. jd ndr_ii. This is an automatic snapshot taken before jd resynchronization starts, yeah that sounds fine, possibly better than DRBD in one way because it might allow the resync to go faster. From the PDF's it sounds like async replication isn't done the same way as the resync, it's done safely, and that it's even possible for async replication to accumulate hours of backlog in a ``disk queue'' without losing write ordering so long as you use the ``blocking mode'' variant of async. Correct reading of the documentation. ii might also be good for debugging a corrupt ZFS, so you can tinker with it but still roll back to the original corrupt copy. I'll read about it---I'm guessing I will need to prepare ahead of time if I want ii available in the toolbox after a disaster. jd AVS has the concept of I/O consistency groups, where all disks jd of a multi-volume filesystem (ZFS, QFS) or database (Oracle, jd Sybase) are kept write-order consistent when using either sync jd or async replication. Awesome, so long as people know to use it. so I guess that's the answer for the OP: use consistency groups! I use the name of the ZFS storage pool, as the name of the SNDR I/O consistency group. The one thing I worry about is, before, AVS was used between RAID and filesystem, which is impossible now because that inter-layer area n olonger exists. If you put the individual device members of a redundant zpool vdev into an AVS consistency group, what will AVS do when one of the devices fails? Nothing, as it is ZFS the reacts to the failed device Does it continue replicating the working devices and ignore the failed one? In this scenario ZFS knows he device failed, which means ZFS will stop writing to the disk, and thus the replica. This would sacrifice redundancy at the DR site. UFS-AVS-RAID would not do that in the same situation. Or hide the failed device from ZFS and slow things down by sending all read/writes of the failed device to the remote mirror? This would slwo down the primary site. UFS-AVS-RAID would not do that in the same situation. The latter ZFS-AVS behavior might be rescueable, if ZFS had the statistical read-preference feature. but writes would still be massively slowed with this scenario, while in UFS-AVS-RAID they would not be. To get back the level of control one used to have for writes, you'd need a different zpool-level way to achieve the intent of the AVS sync/async option. Maybe just a slog which is not AVS-replicated would be enough, modulo other ZFS fixes for hiding slow devices. ZFS-AVS is not UFS-AVS-RAID, and although one can foresee some downside to replicating ZFS with AVS, there are some big wins. Place SNDR in logging mode, and zpool scrub the secondary volumes for consistency, then resume replication. Compressed ZFS Storage pools, result in compressed replication Encrypted ZFS Storage pools, result in encrypted replication ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Jim Dunham Engineering Manager Storage Platform Software Group Sun Microsystems, Inc. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss