Re: [zfs-discuss] ZFS + ISCSI + LINUX QUESTIONS
Nathan, Keep in mind iSCSI target is only in OpenSolaris at this time. On 05/30/2007 10:15 PM, Nathan Huisman wrote: snip = QUESTION #1 What is the best way to mirror two zfs pools in order to achieve a sort of HA storage system? I don't want to have to physically swap my disks into another system if any of the hardware on the ZFS server dies. If I have the following configuration what is the best way to mirror these in near real time? BOX 1 (JBOD-ZFS) BOX 2 (JBOD-ZFS) I've seen the zfs send and recieve commands but I'm not sure how well that would work with a close to real time mirror. If you want this to be redundant (and very scalable) you will want at least 2xBOX 1's and 2x BOX2's. IPMP with redundant GbE switches + NICs as well. Do not use zfs send/recv. Use Sun Cluster 3.2 for HA-ZFS. http://docs.sun.com/app/docs/doc/820-0335/6nc35dge2?a=view There is potential for data loss if the active ZFS node crashes before outstanding transaction groups commit for non-synchronous writes, but the ZVOL (and underlying ext3fs) should not become corrupt (hasn't happened to me yet). Can someone from the ZFS team comment on this? = QUESTION #2 Can ZFS be exported via iscsi and then imported as a disk to a linux system and then be formated with another file system. I wish to use ZFS as a block level file systems for my virtual machines. Specifically using xen. If this is possible, how stable is this? This is possible and is stable in my experience. Scales well if you design your infrastructure correctly. How is error checking handled if the zfs is exported via iscsi and then the block device formated to ext3? Will zfs still be able to check for errors? Yes, ZFS will detect/correct block level errors in ZVOLs as long as you have a redundant zpool configuration (see note below about LVM) If this is possible and this all works, then are there ways to expand a zfs iscsi exported volume and then expand the ext3 file system on the remote host? Haven't tested it myself (yet), but should be possible. You might have to export and re-import the iSCSI target on the Xen dom0 and then resize the ext3 partition (e.g. using 'parted'). If that doesn't work there are other ways to accomplish this. = QUESTION #3 How does zfs handle a bad drive? What process must I go through in order to take out a bad drive and replace it with a good one? If you have a redundant zpool configuration you will replace the failed disk and then issue a 'zpool replace'. = QUESTION #4 What is a good way to back up this HA storage unit? Snapshots will provide an easy way to do it live, but should it be dumped into a tape library, or an third offsite zfs pool using zfs send/recieve or ? Send snapshots to another server that has a RAIDZ (or RAIDZ2) zpool (want space vs performace/redundancy for backup. Opposite of the *MIRRORS* you will want to use for the HA-ZFS cluster - Storage nodes). From this node you can dump to tape, etc. = QUESTION #5 Does the following setup work? BOX 1 (JBOD) - iscsi export - BOX 2 ZFS. In other words, can I setup a bunch of thin storage boxes with low cpu and ram instead of using sas or fc to supply the jbod to the zfs server? Yes. And ZFS+iSCSI makes this relatively cheap. I very strongly recommend against using LVM to handle the mirroring. *You will lose the ability to correct data corruption* at the ZFS level. It also does not scale well, increases complexity, increases cost, and reduces throughput over iSCSI to your ZFS nodes. Leave volume management and redundancy to ZFS. Set up your Xen dom0 boxes to have a redundant path to your ZVOLs over iSCSI. Send your data _one time_ to your ZFS nodes. Let ZFS handle the mirroring and then send that to your iSCSI LUNs on the storage nodes. Make sure you set up half of each mirror in the zpool with a disk from a separate storage node. Be wary of layering ZFS/ZVOLs like this. There are multiple ways to set up your storage nodes (plain iscsitadm or using ZVOls), and if you use ZVOLs you may want to disable checksum and leave that to your ZFS nodes. Other: -Others have reported that Sil3124 based SATA expansion cards work well with Solaris. -Test your failover times between ZFS nodes (BOX 2s). Having lots of iscsi shares/filesystems can cause this to be slow. Hopefully this will be improved with parallel zpool device mounting in the future. -ZVOLs are not sparse by default. I prefer this, but if you really want to use sparse ZVOLs there is a switch for it in 'zfs create' -This will work, but TEST, TEST, TEST for your particular scenario. -Yes, this can be built for less than $30k US for your storage size requirement. -I get ~150MB/s throughput on this setup with 2 storage nodes of 6 disks each. Appears as ~3TB mirror on ZFS nodes. -Use Build 64 or later, as there is a ZVOL bug in b63 if I'm not mistaken. Probably a good idea to read through the open ZFS bugs, too.
Re: [zfs-discuss] current state of play with ZFS boot and install?
I second that... I am trying to figure out what is missing so that I can use ZFS exclusively... right now as far as I know two major obstacles are no support from installer and issues with live update. Are both of those expected to be resolved this year? On 5/30/07, Carl Brewer [EMAIL PROTECTED] wrote: Out of curiosity, I'm wondering if Lori, or anyone else who actually writes the stuff, has any sort of a 'current state of play' page that describes the latest OS ON release and how it does ZFS boot and installs? There's blogs all over the place, of course, which have a lot of stale information, but is there a 'the current release supports this, and this is how you install it' page anywhere, or somewhere in particular to watch? I've been playing with ZFS boot since around b34 or whenever it was that it first started to be able to be used as a boot partition with the temporary ufs partition hack, but I understand it's moved beyond that. I've been downloading and playing with the ON builds every now and then, but haven't found (haven't looked in the right places?) anywhere where each build has this is what this build does differently, this is what works and how documented. can someone belt me with a cluestick please? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] current state of play with ZFS boot and install?
I've been using the zfsbootkit to modify my jumpstart images. As far as I know, the kit is the current process for zfs boot until further notice. http://www.opensolaris.org/os/community/install/files/zfsboot-kit-20060418.i386.tar.bz2 See readme in the package. On Thu, 2007-05-31 at 02:06 -0700, Marko Milisavljevic wrote: I second that... I am trying to figure out what is missing so that I can use ZFS exclusively... right now as far as I know two major obstacles are no support from installer and issues with live update. Are both of those expected to be resolved this year? On 5/30/07, Carl Brewer [EMAIL PROTECTED] wrote: Out of curiosity, I'm wondering if Lori, or anyone else who actually writes the stuff, has any sort of a 'current state of play' page that describes the latest OS ON release and how it does ZFS boot and installs? There's blogs all over the place, of course, which have a lot of stale information, but is there a 'the current release supports this, and this is how you install it' page anywhere, or somewhere in particular to watch? I've been playing with ZFS boot since around b34 or whenever it was that it first started to be able to be used as a boot partition with the temporary ufs partition hack, but I understand it's moved beyond that. I've been downloading and playing with the ON builds every now and then, but haven't found (haven't looked in the right places?) anywhere where each build has this is what this build does differently, this is what works and how documented. can someone belt me with a cluestick please? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Mike Dotson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Directory
i wanted to know how does ZFS finds an entry of a file from its dirctory object. anylinks to the code will be highly appriciated. thankx regards kanishk ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS + ISCSI + LINUX QUESTIONS
On Thu, 31 May 2007, Darren J Moffat wrote: Since you are doing iSCSI and may not be running ZFS on the initiator (client) then I highly recommend that you run with IPsec using at least AH (or ESP with Authentication) to protect the transport. Don't assume that your network is reliable. ZFS won't help you here if it isn't running on the [Hi Darren] Thats a curious recommendation! You don't think that TCP/IP is reliable enough to provide iSCSI data integrity? What errors and error rates have you seen? iSCSI initiator, and even if it is it would need two targets to be able to repair. Regards, Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS + ISCSI + LINUX QUESTIONS
Al Hopper wrote: On Thu, 31 May 2007, Darren J Moffat wrote: Since you are doing iSCSI and may not be running ZFS on the initiator (client) then I highly recommend that you run with IPsec using at least AH (or ESP with Authentication) to protect the transport. Don't assume that your network is reliable. ZFS won't help you here if it isn't running on the [Hi Darren] Thats a curious recommendation! You don't think that TCP/IP is reliable enough to provide iSCSI data integrity? No I don't. Also I don't personally thing that the access control model of iSCSI is sufficient and trust IPsec more in that respect. Personally I would actually like to see at IPsec AH be the default for all traffic that isn't otherwise doing a cryptographically strong integrity check of its own. What errors and error rates have you seen? I have seen switches flip bits in NFS traffic such that the TCP checksum still match yet the data was corrupted. One of the ways we saw this was when files were being checked out of SCCS, the SCCS checksum failed. Other ways we saw it was the compiler failing to compile untouched code. Just like we with ZFS we don't trust the HBA and the disks to give us correct data. With iSCSI the network is your HBA and cableing and in part your disk controller as well. Defence in depth is a common mantra in the security geek world, I take that forward to protecting the data in transit too even when it isn't purely for security reasons. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Directory
Kanishk, Directories are implemented as ZAP objects. Look at the routines in that order : - zfs_lookup() - zfs_dirlook() - zfs_dirent_lock() - zap_lookup Hope that helps. Regards, Sanjeev. kanishk wrote: i wanted to know how does ZFS finds an entry of a file from its dirctory object. anylinks to the code will be highly appriciated. thankx regards kanishk ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Solaris Revenue Products Engineering, India Engineering Center, Sun Microsystems India Pvt Ltd. Tel:x27521 +91 80 669 27521 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] current state of play with ZFS boot and install?
I am running the zfsboot from b62. So far, it has been recommended that I not upgrade to a newer build. Malachi On 5/31/07, Mike Dotson [EMAIL PROTECTED] wrote: I've been using the zfsbootkit to modify my jumpstart images. As far as I know, the kit is the current process for zfs boot until further notice. http://www.opensolaris.org/os/community/install/files/zfsboot-kit-20060418.i386.tar.bz2 See readme in the package. On Thu, 2007-05-31 at 02:06 -0700, Marko Milisavljevic wrote: I second that... I am trying to figure out what is missing so that I can use ZFS exclusively... right now as far as I know two major obstacles are no support from installer and issues with live update. Are both of those expected to be resolved this year? On 5/30/07, Carl Brewer [EMAIL PROTECTED] wrote: Out of curiosity, I'm wondering if Lori, or anyone else who actually writes the stuff, has any sort of a 'current state of play' page that describes the latest OS ON release and how it does ZFS boot and installs? There's blogs all over the place, of course, which have a lot of stale information, but is there a 'the current release supports this, and this is how you install it' page anywhere, or somewhere in particular to watch? I've been playing with ZFS boot since around b34 or whenever it was that it first started to be able to be used as a boot partition with the temporary ufs partition hack, but I understand it's moved beyond that. I've been downloading and playing with the ON builds every now and then, but haven't found (haven't looked in the right places?) anywhere where each build has this is what this build does differently, this is what works and how documented. can someone belt me with a cluestick please? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Mike Dotson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Which IO transfer size are zfs using when writing to disk ?
Erik Lund wrote: The parameters are: maxphys=0x20, sd_max_xfer_size = 0x20 og sd_max_xfer_size = 0x20 Can i be sure that the IO transfer is 2MB or ? Use iostat or another tool (DTrace iosnoop.d) to see for sure. Note that these are upper limits... I wants to lign up my Sun ST6140 for optimal performance and therefore needs to know the transfer size. OK, but since this is zfs-discuss, I presume you know that ZFS uses a maximum block size of 128kBytes. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: how to move a zfs file system between disks
It is not possible to use send and receive of the pool is not imported. It is however possible to use send and receive when the file system is not mounted. --chris This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS + ISCSI + LINUX QUESTIONS
On 5/31/07, Darren J Moffat [EMAIL PROTECTED] wrote: Since you are doing iSCSI and may not be running ZFS on the initiator (client) then I highly recommend that you run with IPsec using at least AH (or ESP with Authentication) to protect the transport. Don't assume that your network is reliable. ZFS won't help you here if it isn't running on the iSCSI initiator, and even if it is it would need two targets to be able to repair. If you don't intend to encrypt the iSCSI headers / payloads, why not just use the header and data digests that are part of the iSCSI protocol? Thanks, - Ryan -- UNIX Administrator http://prefetch.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] current state of play with ZFS boot and install?
Carl's request for a current state of play is a reasonable one. I have modified this page: http://www.opensolaris.org/os/community/zfs/boot/netinstall/ to include a list of status updates. I will keep it current so that anyone who wants to know how to install zfs boot using a netinstall can get a working combination of Solaris community release and the netinstall kit. Lori Malachi de Ælfweald wrote: I am running the zfsboot from b62. So far, it has been recommended that I not upgrade to a newer build. Malachi On 5/31/07, *Mike Dotson* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: I've been using the zfsbootkit to modify my jumpstart images. As far as I know, the kit is the current process for zfs boot until further notice. http://www.opensolaris.org/os/community/install/files/zfsboot-kit-20060418.i386.tar.bz2 http://www.opensolaris.org/os/community/install/files/zfsboot-kit-20060418.i386.tar.bz2 See readme in the package. On Thu, 2007-05-31 at 02:06 -0700, Marko Milisavljevic wrote: I second that... I am trying to figure out what is missing so that I can use ZFS exclusively... right now as far as I know two major obstacles are no support from installer and issues with live update. Are both of those expected to be resolved this year? On 5/30/07, Carl Brewer [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Out of curiosity, I'm wondering if Lori, or anyone else who actually writes the stuff, has any sort of a 'current state of play' page that describes the latest OS ON release and how it does ZFS boot and installs? There's blogs all over the place, of course, which have a lot of stale information, but is there a 'the current release supports this, and this is how you install it' page anywhere, or somewhere in particular to watch? I've been playing with ZFS boot since around b34 or whenever it was that it first started to be able to be used as a boot partition with the temporary ufs partition hack, but I understand it's moved beyond that. I've been downloading and playing with the ON builds every now and then, but haven't found (haven't looked in the right places?) anywhere where each build has this is what this build does differently, this is what works and how documented. can someone belt me with a cluestick please? This message posted from opensolaris.org http://opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Mike Dotson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS consistency guarantee
Hi Folks, how can i guarantee consistency for the ZFS snapshots?. If i am running a db or any other app on my ZFS and want to take a snapshot is there is any filesystem equivalent command to quiesce the ZFS before taking a snapshot or do i have to rely on the app itself?. Can i do something like lockfs or the like?. If i take snapshost on the storage, how can i guarantee consistency on those snapshosts?. Any methods to quiesce the FS after which i can take snapshosts on storage?. thanks for any inputs. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS consistency guarantee
ganesh wrote: Hi Folks, how can i guarantee consistency for the ZFS snapshots?. If i am running a db or any other app on my ZFS and want to take a snapshot is there is any filesystem equivalent command to quiesce the ZFS before taking a snapshot or do i have to rely on the app itself?. You almost always have to quiesce the app in order to flush its buffers. Can i do something like lockfs or the like?. If i take snapshost on the storage, how can i guarantee consistency on those snapshosts?. Any methods to quiesce the FS after which i can take snapshosts on storage?. zfs snapshot -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Overview (rollup) of recent activity on zfs-discuss
For background on what this is, see: http://www.opensolaris.org/jive/message.jspa?messageID=24416#24416 http://www.opensolaris.org/jive/message.jspa?messageID=25200#25200 = zfs-discuss 05/01 - 05/15 = Size of all threads during period: Thread size Topic --- - 36 gzip compression throttles system? 26 Lots of overhead with ZFS - what am I doing wrong? 22 Motley group of discs? 15 ZFS Support for remote mirroring 13 Need guidance on RAID 5, ZFS, and RAIDZ on home file server 12 Optimal strategy (add or replace disks) to build a cheap and raidz? 11 ZFS over a layered driver interface 10 ZFS not utilizing all disks 10 Resilvering speed? 9 zfs boot image conversion kit is posted 8 How does ZFS write data to disks? 7 Will this work? 7 Filesystem Benchmark 6 setup_install_server, cpio and zfs : fix needed ? 6 ZFS vs UFS2 overhead and may be a bug? 6 ZFS Storage Pools Recommendations for Productive Environments 6 Odd zpool create error 5 recovered state after system crash 5 Zpool, RaidZ how it spreads its disk load? 5 Samba and ZFS ACL Question 5 Remove files when at quota limit 5 Lost in boot loop.. 5 Is this a workable ORACLE disaster recovery solution? 4 zpool create -f ... fails on disk with previous 4 zfs and jbod-storage 4 does every fsync() require O(log n) platter-writes? 4 ZFS raid on removable media for backups/temporary use possible? 4 ZFS Snapshot destroy to 4 Permanently removing vdevs from a pool 4 Issue with adding existing EFI disks to a zpool 4 A quick ZFS question: RAID-Z Disk Replacement + Growth ? 4 ZFS: Under The Hood at LOSUG (16/05/07) 3 zpool create -f ... fails on disk with previous UFS on it 3 iscsitadm local_name in ZFS 3 Will this work?] 3 Very Large Filesystems 3 Q: recreate pool? 3 Multiple filesystem costs? Directory sizes? 3 Force rewriting of all data, to push stripes onto newly added devices? 3 Extremely long ZFS destroy operations 3 Clear corrupted data 3 Boot disk clone with zpool present 3 Automatic rotating snapshots 2 zfs tcsh command completion 2 zfs lost function 2 zdb -l goes wild about the labels 2 tape-backup software (was: Very Large Filesystems) 2 snv63: kernel panic on import 2 Solaris Backup Server 2 Motley group of discs? (doing it right, or right now) 2 External eSata ZFS raid possible? 2 Best way to migrate filesystems to ZFS? 2 ARC, mmap, pagecache... 2 3320 JBOD setup 1 zpool status faulted, but raid1z status is online? 1 zpool list and df -k difference 1 zpool import - arc problem? 1 zpool command causes a crash of my server 1 zfs send/receive question 1 zfs performance on fuse (Linux) compared to other fs 1 zfs dataset option relations 1 thoughts on ZFS copies 1 simple Raid-Z question 1 crash 1 ZFS with raidz 1 ZFS in S10update 4 1 ZFS improvements 1 ZFS and Oracle db production deployment 1 ZFS Boot: Dividing up the name space 1 Who modified my ZFS receive destination? 1 Summary: Poor man's backup by attaching/detaching mirror drives on a _striped_ pool? 1 Optimal strategy (add or replace disks) tobuild a cheap and raidz? 1 Optimal strategy (add or replace disks) to build acheap and raidz? 1 Move data from the zpool (root) to a zfs file system 1 Filesystem full not reported in /var/adm/messages 1 Benchmarking 1 Benchmark which models ISP workloads 1 B62 AHCI and ZFS Posting activity by person for period: # of posts By -- -- 15 matthew.ahrens at sun.com (matthew ahrens) 15 ian at ianshome.com (ian collins) 14 richard.elling at sun.com (richard elling) 13 rmilkowski at task.gda.pl (robert milkowski) 11 me at tomservo.cc (mario goebbels) 11 marko at cognistudio.com (marko milisavljevic) 10 jk at tools.de (=?utf-8?q?j=c3=bcrgen_keil?=) 7 al at logical-approach.com (al hopper) 6 toby at smartgames.ca (toby thain) 6 tmcmahon2
Re: [zfs-discuss] ZFS boot: Now, how can I do a pseudo live upgrade?
zfs-boot crowd: I said I'd try to come up with a procedure for liveupgrading the netinstalled zfs-root setup, but I haven't found time to do so yet (I'm focusing on getting this supported in install for real). So while I hate to retreat into the I never said you could upgrade this configuration excuse, that's what I'm going to do, at least for now. I might get a chance to work on a liveupgrade procedure in the next couple of weeks. In the meantime, if someone else wants to take a shot at it and post the results, go ahead. Lori Malachi de Ælfweald wrote: No, I did mean 'snapshot -r' but I thought someone on the list said that the '-r' wouldn't work until b63... hmmm... Well, realistically, all of us new to this should probably know how to patch our system before we put any useful data on it anyway, right? :) Thanks, Mal On 5/25/07, *Constantin Gonzalez* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Hi Malachi, Malachi de Ælfweald wrote: I'm actually wondering the same thing because I have b62 w/ the ZFS bits; but need the snapshot's -r functionality. you're lucky, it's already there. From my b62 machine's man zfs: zfs snapshot [-r] [EMAIL PROTECTED]|[EMAIL PROTECTED] Creates a snapshot with the given name. See the Snapshots section for details. -rRecursively create snapshots of all descendant datasets. Snapshots are taken atomically, so that all recursive snapshots correspond to the same moment in time. Or did you mean send -r? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS consistency guarantee
how can i guarantee consistency for the ZFS snapshots?. Filesystem consistency or application/data consistency? If i am running a db or any other app on my ZFS and want to take a snapshot is there is any filesystem equivalent command to quiesce the ZFS before taking a snapshot or do i have to rely on the app itself?. Because ZFS is taking the snapshot, it is able to guarantee filesystem consistency. However, it cannot speak to the data or application contents. You have to do that, and ensure it has a consistent on-disk image at the time of the snapshot. This is the same as any other snapshot or copy technique would require. Can i do something like lockfs or the like?. If i take snapshost on the storage, how can i guarantee consistency on those snapshosts?. Any methods to quiesce the FS after which i can take snapshosts on storage?. At the filesystem level, that's all taken care of. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool relayout
Just a quick question. If I create a raidz pool but then later find that I need more space I can add another raidz set to the pool but what happens to data already in the pool? Does a relayout occur or does zfs work towards balancing I/O to the pool across the 2 raidz sets only as new data is written? Also, is it possible to explicitly request a relayout; for example can I convert a raidz1 pool to a raidz2 pool? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool relayout
Just a quick question. If I create a raidz pool but then later find that I need more space I can add another raidz set to the pool but what happens to data already in the pool? Does a relayout occur or does zfs work towards balancing I/O to the pool across the 2 raidz sets only as new data is written? Technically, raidz describes a vdev in a pool, not a pool itself. So yes, you can add another raidz to the pool. New data is striped across both components, but weighted to the empty one to try to balance things out a bit over time. No relayout occurs. Also, is it possible to explicitly request a relayout; for example can I convert a raidz1 pool to a raidz2 pool? Not today. My assumption is that other items (like zpool shrink/evacuation) are being targeted as a higher priority. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS + ISCSI + LINUX QUESTIONS
Al Hopper 提到: On Thu, 31 May 2007, David Anderson wrote: snip . Other: -Others have reported that Sil3124 based SATA expansion cards work well with Solaris. [Sorry - don't mean to hijack this interesting thread] I believe that there is a serious bug with the si3124 driver that has not been addressed. Ben Rockwood and I have seen it firsthand, and a quick look at the Hg logs shows that si3124.c has not been changed in 6 months. Basic description of the bug: under heavy load (lots of I/O ops/Sec) all data from the drive(s) will completely stop for an extended period of time - 60 to 90+ Seconds. There was a recent discussion of the same issue on the Solaris on x86 list ([EMAIL PROTECTED]) - several experienced x86ers have seen this bug and found the current driver unusable. Interestingly, one individual said (paraphrased) ... don't see any issues and then later ... now I see it and it was there the entire time. Recommendation: If you plan to use the 3124 driver, test it yourself under heavy load. A simple test with one disk drive will suffice. In my case, it was plainly obvious with one (ex Sun M20) drive and a UFS filesystem - all I was doing was tarring up /export/home to another drive. Periodically the tar process would simply stop (iostat went flatline) - it looked like the system was going to crash - then (after 60+ Secs) the tar process continued as if nothing had happened. This was repeated 4 or 5 times before the 'tar cvf' (of around 40Mb of data) completed successfully. Regards, Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Does the si3124 bug Hopper mentioned has something to do with below ERROR? I met them in workspace warlock build step, but I did nothing to si3124 codes... warlock -c ../../common/io/warlock/si3124.wlcmd si3124.ll \ ../sd/sd.ll ../sd/sd_xbuf.ll \ -l ../scsi/scsi_capabilities.ll -l ../scsi/scsi_control.ll -l ../scsi/scsi_watch.ll -l ../scsi/scsi_data.ll -l ../scsi/scsi_resource.ll -l ../scsi/scsi_subr.ll -l ../scsi/scsi_hba.ll -l ../scsi/scsi_transport.ll -l ../scsi/scsi_confsubr.ll -l ../scsi/scsi_reset_notify.ll \ -l ../cmlb/cmlb.ll \ -l ../sata/sata.ll \ -l ../warlock/ddi_dki_impl.ll The following variables don't seem to be protected consistently: dev_info::devi_state *** Error code 10 make: Fatal error: Command failed for target `si3124.ok' Current working directory /net/greatwall/workspaces/wifi_rtw/usr/src/uts/intel/si3124 *** Error code 1 The following command caused the error: cd ../si3124; make clean; make warlock make: Fatal error: Command failed for target `warlock.sata' Current working directory /net/greatwall/workspaces/wifi_rtw/usr/src/uts/intel/warlock - Michael ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool relayout
On 31 May, 2007 - Vic Engle sent me these 0,6K bytes: Just a quick question. If I create a raidz pool but then later find that I need more space I can add another raidz set to the pool but what happens to data already in the pool? Does a relayout occur or does zfs work towards balancing I/O to the pool across the 2 raidz sets only as new data is written? If you have a raidz of say 500G, filled with 300G of data.. then you add another raidz of 500G and start writing.. ZFS will put more data on the second raidz thing to even out the distribution.. Also, is it possible to explicitly request a relayout; for example can I convert a raidz1 pool to a raidz2 pool? Currently no. /Tomas -- Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS vs UFS performance measurement
Hi folks, We have the following disks :and we want to create a STRIPE c7t2d0 c7t3d0 c7t4d0 c7t5d0 c7t8d0 c7t9d0 c8t2d0 c8t3d0 c8t4d0 c8t5d0 c8t8d0 c8t9d0 What we would like to measure is how the following two STRIPES perform STRIPE ( Created using Solaris Volume Manager ) STRIPE created with ZFS. How can I acheive the exact STRIPE ( w.r.t to the interleaving, stripe size, etc... ). We want to make sure that the two STRIPE configurations are identical. Any pointers ? Thanks in Advance _D This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS vs UFS performance measurement
Durga Deep Tirunagari wrote: Hi folks, We have the following disks :and we want to create a STRIPE c7t2d0 c7t3d0 c7t4d0 c7t5d0 c7t8d0 c7t9d0 c8t2d0 c8t3d0 c8t4d0 c8t5d0 c8t8d0 c8t9d0 What we would like to measure is how the following two STRIPES perform STRIPE ( Created using Solaris Volume Manager ) STRIPE created with ZFS. How can I acheive the exact STRIPE ( w.r.t to the interleaving, stripe size, etc... ). We want to make sure that the two STRIPE configurations are identical. Any pointers ? This is not possible because SVM uses a fixed stripe allocation and ZFS uses dynamic stripe allocation. Reads and writes for ZFS will only use the number of disks needed, not all of the disks (every time) like SVM. In other words, it is like comparing apples and oranges, at least as far as RAID implementations is concerned. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS boot: Now, how can I do a pseudo live upgrade?
I've had at least some success (tried it once so far) doing a BFU to cloned filesystem from a b62 zfs root system, I could probably document that if there is interest. I have not tried taking a new ISO and installing the new packages ontop of a cloned fileystem though. On 5/31/07, Lori Alt [EMAIL PROTECTED] wrote: zfs-boot crowd: I said I'd try to come up with a procedure for liveupgrading the netinstalled zfs-root setup, but I haven't found time to do so yet (I'm focusing on getting this supported in install for real). So while I hate to retreat into the I never said you could upgrade this configuration excuse, that's what I'm going to do, at least for now. I might get a chance to work on a liveupgrade procedure in the next couple of weeks. In the meantime, if someone else wants to take a shot at it and post the results, go ahead. Lori Malachi de Ælfweald wrote: No, I did mean 'snapshot -r' but I thought someone on the list said that the '-r' wouldn't work until b63... hmmm... Well, realistically, all of us new to this should probably know how to patch our system before we put any useful data on it anyway, right? :) Thanks, Mal On 5/25/07, *Constantin Gonzalez* [EMAIL PROTECTED] mailto: [EMAIL PROTECTED] wrote: Hi Malachi, Malachi de Ælfweald wrote: I'm actually wondering the same thing because I have b62 w/ the ZFS bits; but need the snapshot's -r functionality. you're lucky, it's already there. From my b62 machine's man zfs: zfs snapshot [-r] [EMAIL PROTECTED]|[EMAIL PROTECTED] Creates a snapshot with the given name. See the Snapshots section for details. -rRecursively create snapshots of all descendant datasets. Snapshots are taken atomically, so that all recursive snapshots correspond to the same moment in time. Or did you mean send -r? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs migration
Sorry to bother you but something is not clear to me regarding this process.. Ok, lets sat I have two internal disks (73gb each) and I am mirror them... now I want to replace those two mirrored disks into one LUN that is on SAN and it is around 100gb. Now I do meet one requirement of having more than 73gb of storage but do I need only something like 73gb at minimum or do I actually need two luns of 73gb or more since I have it mirrored? My goal is simple to move data of two mirrored disks into one single SAN device... Any ideas if what I am planning to do is duable? or do I need to use zfs send and receive and just update everything and switch when I am done? or do I just add this SAN disk to the existing pool and then remove mirror somehow? I would just have to make sure that all data is off that disk... is there any option to evacuate data off that mirror? here is what I exactly have: bash-3.00# zpool list NAMESIZEUSED AVAILCAP HEALTH ALTROOT mypool 68G 52.9G 15.1G77% ONLINE - bash-3.00# zpool status pool: mypool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM mypool ONLINE 0 0 0 mirrorONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 errors: No known data errors bash-3.00# On Tue, 29 May 2007, Cyril Plisko wrote: On 5/29/07, Krzys [EMAIL PROTECTED] wrote: Hello folks, I have a question. Currently I have zfs pool (mirror) on two internal disks... I wanted to connect that server to SAN, then add more storage to this pool (double the space) then start using it. Then what I wanted to do is just take out the internal disks out of that pool and use SAN only. Is there any way to do that with zfs pools? Is there any way to move data from those internal disks to external disks? You can zpool replace your disks with other disks. Provided that you have same amount of new disks and they are of same or greater size -- Regards, Cyril !DSPAM:122,465c515921755021468! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS vs UFS performance measurement
Richard Elling wrote: Durga Deep Tirunagari wrote: Hi folks, We have the following disks :and we want to create a STRIPE c7t2d0 c7t3d0 c7t4d0 c7t5d0 c7t8d0 c7t9d0 c8t2d0 c8t3d0 c8t4d0 c8t5d0 c8t8d0 c8t9d0 What we would like to measure is how the following two STRIPES perform STRIPE ( Created using Solaris Volume Manager ) STRIPE created with ZFS. How can I acheive the exact STRIPE ( w.r.t to the interleaving, stripe size, etc... ). We want to make sure that the two STRIPE configurations are identical. Any pointers ? This is not possible because SVM uses a fixed stripe allocation and ZFS uses dynamic stripe allocation. Reads and writes for ZFS will only use the number of disks needed, not all of the disks (every time) like SVM. In other words, it is like comparing apples and oranges, at least as far as RAID implementations is concerned. -- richard hi Richard, Please suggest an alternative ?. Please advice on whats the best course of action _D ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS boot: Now, how can I do a pseudo live upgrade?
Jason King wrote: I've had at least some success (tried it once so far) doing a BFU to cloned filesystem from a b62 zfs root system, I could probably document that if there is interest. Yep, been there too, weather's nice :-) http://blogs.sun.com/timf/entry/an_easy_way_to_manage (and previously http://blogs.sun.com/timf/entry/zfs_mountrootadm ) I have not tried taking a new ISO and installing the new packages ontop of a cloned fileystem though. I seem to remember trying something like that before, didn't work. I suspect there's more to it than that unfortunately - would love to have the time to play about more with upgrade hacks. cheers, tim On 5/31/07, *Lori Alt* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: zfs-boot crowd: I said I'd try to come up with a procedure for liveupgrading the netinstalled zfs-root setup, but I haven't found time to do so yet (I'm focusing on getting this supported in install for real). So while I hate to retreat into the I never said you could upgrade this configuration excuse, that's what I'm going to do, at least for now. I might get a chance to work on a liveupgrade procedure in the next couple of weeks. In the meantime, if someone else wants to take a shot at it and post the results, go ahead. Lori Malachi de Ælfweald wrote: No, I did mean 'snapshot -r' but I thought someone on the list said that the '-r' wouldn't work until b63... hmmm... Well, realistically, all of us new to this should probably know how to patch our system before we put any useful data on it anyway, right? :) Thanks, Mal On 5/25/07, *Constantin Gonzalez* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] mailto: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Hi Malachi, Malachi de Ælfweald wrote: I'm actually wondering the same thing because I have b62 w/ the ZFS bits; but need the snapshot's -r functionality. you're lucky, it's already there. From my b62 machine's man zfs: zfs snapshot [-r] [EMAIL PROTECTED]|[EMAIL PROTECTED] Creates a snapshot with the given name. See the Snapshots section for details. -rRecursively create snapshots of all descendant datasets. Snapshots are taken atomically, so that all recursive snapshots correspond to the same moment in time. Or did you mean send -r? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ http://blogs.sun.com/constantin/ http://blogs.sun.com/constantin/ http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Tim Foster, Sun Microsystems Inc, Solaris Engineering Ops http://blogs.sun.com/timf ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS vs UFS performance measurement
[EMAIL PROTECTED] wrote: Richard Elling wrote: Durga Deep Tirunagari wrote: Hi folks, We have the following disks :and we want to create a STRIPE c7t2d0 c7t3d0 c7t4d0 c7t5d0 c7t8d0 c7t9d0 c8t2d0 c8t3d0 c8t4d0 c8t5d0 c8t8d0 c8t9d0 What we would like to measure is how the following two STRIPES perform STRIPE ( Created using Solaris Volume Manager ) STRIPE created with ZFS. How can I acheive the exact STRIPE ( w.r.t to the interleaving, stripe size, etc... ). We want to make sure that the two STRIPE configurations are identical. Any pointers ? This is not possible because SVM uses a fixed stripe allocation and ZFS uses dynamic stripe allocation. Reads and writes for ZFS will only use the number of disks needed, not all of the disks (every time) like SVM. In other words, it is like comparing apples and oranges, at least as far as RAID implementations is concerned. -- richard hi Richard, Please suggest an alternative ?. Please advice on whats the best course of action What are you trying to accomplish? The PAE group in Sun has a team working on ZFS performance and characterization. Some results have been blogged externally. It may be that your question is already answered someplace. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Delegated Administration?
Is it possible to give a user control of a ZFS filesystem such that the user can create their own file systems within it, take snapshots, etc.? Thanks, Haik This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Thoughts on CF/SSDs [was: ZFS - Use h/w raid or not?Thoughts.Considerations.]
Hi Mike, more thoughts below... Ellis, Mike wrote: Hey Richard, thanks for sparking the conversation... This is a very interesting topic (especially if you take it out of the HPC we need 1000 servers to have this minimal boot image space into general purpose/enterprise computing) CF cards aren't generally very fast, so the solid state disk vendors are putting them into hard disk form factors with SAS/SATA interfaces. These will be more interesting because they are really fast and can employ more sophisticated data protection methods -- like magnetic disk drives :-) Based on your earlier note, it appears you're not planning to use cheapo free after rebate CF cards :-) (The cheap-ones would probably be perfect for ZFS a-la cheap-o-JBOD). The price of flash memory has dropped by 50% this year. Expect this trend to follow Moore's law. Having boot disks mirrored across controllers has had sys-admins sleep better over the years (especially in FC-loop-cases with both drives on the same loop... Sigh). If the USB-bus one might hang these fancy FC-cards on is robust enough then perhaps a single battle hardened CF-card will suffice... (although zfs ditto-blocks or some form of protection might still be considered a good thing?) Having 2 cards would certainly make the unlikely replacement of a card a LOT more straight-forward than a single-card failure... Much of this would depend on the quality of these CF-cards and how they put up under load/stress/time Disagree. With two cards, you have to implement software mirroring of some sort. While ZFS is a step in the right direction (simplifying the process) it is unproven for long term system administration. The costs of implementing software mirroring occur in the complexity of managing the software environment over time as upgrades and patches occur. Reliability tends to trump availability for this reason. -- If we're going down this CF-boot path, many of us are going to have to re-think our boot-environment quite a bit. We've been spoiled with 36+ GB mirrored-boot drives for some time now (if you do a lot of PATCHING, you'll find that even those can get tight But that's a discussion for a different day) I don't think most enterprise boot disk layouts are going to fit (even unmirrored) onto a single 4GB CF-card. So we'll have to play some games where we start splitting off /opt, /var, (which is fairly read-write intensive when you have process-accounting etc. running) onto some other non-CF filesystem (likely a SAN of some variety). At some point the hackery a 4GB CF-card is going to force us to do, is going to become more complex than just biting the bullet and doing a full multipath-ed SAN-boot calling it a day. (or perhaps some future iSCSI/NFS boot for the SAN-averse) 4 GBytes is possible, but 8 GBytes ( $100 today) will be more common. 16 GByte CFs are still above $100... wait a few months. These are often used for the high-end digital cameras, where there is no redundancy, so the photography sites might be a good source of quality evaluations. Seriously though... If (say in some HPC/grid space?) you can stick your ENTIRE boot environment onto a 4GB CF-card, why not just do the SAN, NFS/iSCSI boot thing instead? (what ever happened to: http://blogs.sun.com/dweibel/entry/sprint_snw_2006#comments ) Good question. You can build an NFS service which is much more reliable than a disk, quite easily in fact. Some people get all upset about that, though. N.B. a client only needs the NFS service to be available when an I/O operation is started. Once you boot and have been running for a while, most stuff should be cached in main memory and your reliance on the NFS boot server is reduced. This makes analysis of the reliability of such systems difficult. -- But lets explore the CF thing some more... There is something there, although I think Sun might have to provide some best-practices/suggestions as to how customers that don't run a minimum-config-no-local-apps, pacct, monitoring, etc. solaris environment are best to use something like this. Use it as a pivot boot onto the real root-image? That would delegate the CF-card to little more than a rescue/utility image Kinda cool, but not earth-shattering I would think (especially for those already utilizing wanboot for such purposes) On my list of things to do is measure the actual block reuse patterns. For ZFS, this isn't really interesting because of the COW. For UFS, we do expect some hot spots. But even then, there is some debate over whether the problems will hit in metadata first (file appends do not rewrite original data, so logs aren't interesting). Since UFS metadata is not redundant (unlike ZFS) the issues may get tricky. Somewhere on my list of things to do... and it isn't a trivial data collection exercise. -- Splitting off /var and friends from the boot environment (and still packing the boot env say on a ditto-block 4GB FC card)
Re: [zfs-discuss] zfs migration
Krzys wrote: Sorry to bother you but something is not clear to me regarding this process.. Ok, lets sat I have two internal disks (73gb each) and I am mirror them... now I want to replace those two mirrored disks into one LUN that is on SAN and it is around 100gb. Now I do meet one requirement of having more than 73gb of storage but do I need only something like 73gb at minimum or do I actually need two luns of 73gb or more since I have it mirrored? You can attach any number of devices to a mirror. You can detach all but one of the devices from a mirror. Obviously, when the number is one, you don't currently have a mirror. The resulting logical size will be equivalent to the smallest device. My goal is simple to move data of two mirrored disks into one single SAN device... Any ideas if what I am planning to do is duable? or do I need to use zfs send and receive and just update everything and switch when I am done? or do I just add this SAN disk to the existing pool and then remove mirror somehow? I would just have to make sure that all data is off that disk... is there any option to evacuate data off that mirror? The ZFS terminology is attach and detach A replace is an attach followed by detach. It is a good idea to verify that the sync has completed before detaching. zpool status will show the current status. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Delegated Administration?
Haik Aftandilian wrote: Is it possible to give a user control of a ZFS filesystem such that the user can create their own file systems within it, take snapshots, etc.? Thanks, Haik Support for this should be available within the next month or two. You should check out PSARC/2006/465 http://www.opensolaris.org/jive/thread.jspa?messageID=47766 -Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on CF/SSDs [was: ZFS - Use h/w raid or not?Thoughts.Considerations.]
Richard Elling wrote: CF cards aren't generally very fast, so the solid state disk vendors are putting them into hard disk form factors with SAS/SATA interfaces. Timing is everything... a new standard might help... let's call it miCard http://www.eetimes.com/news/latest/showArticle.jhtml?articleID=199703805 -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs boot error recovery
hi all, i would like to ask some questions regarding best practices for zfs recovery if disk errors occur. currently i have zfs boot (nv62) and the following setup: 2 si3224 controllers (each 4 sata disks) 8 sata disks, same size, same type i have two pools: a) rootpool b) datapool the rootpool is a mirrored pool, where every disk has a slice (the s0, which is 5 % of the whole disk) and this is devoted to the rootpool, just for mirroring. the rest of the disk (s1) is added to the datapool which is raidz. my idea is that if any disk is corrupt i am still be able to boot. now I have some questions: a) if i want to boot from every disk in case of error, i have to setup grub on every disk, such that if the controller sets this disk as the booting, the rootpool is able to be loaded from that. b) what is the best way to as fast as possible replace a disk. adding a disk as hotspare for the raidz is a good idea. but i also would like to replace the disk during runtime as simple as possible. the problem is that for the root pool the disks are labeled (the slices thingy). So I cannot simply detach the volumes and replace the disk and attach them again, but I have to format the disk such that the slicing exists. Is there some clever way to automatically re-label a replacement disk? c) si 3224 related question: is it possible to simply hot swap the disk (i have the disks in special hot-swappable units, but have no experience in hotswapping under solaris, such that i want to have some echo). d) do you have best practices for systems like that above? what are the best resources on the web for learning about monitoring the health of the zfs system (like email notifications in case of disk failures...) thannks in advance -- Jakob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Delegated Administration?
Support for this should be available within the next month or two. You should check out PSARC/2006/465 http://www.opensolaris.org/jive/thread.jspa?messageID=47766 This is what I was looking for. Thanks, Haik ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs migration
Hmm, I am having some problems, I did follow what you suggested and here is what I did: bash-3.00# zpool status pool: mypool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM mypool ONLINE 0 0 0 mirrorONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 errors: No known data errors bash-3.00# zpool detach mypool c1t3d0 bash-3.00# zpool status pool: mypool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM mypool ONLINE 0 0 0 c1t2d0ONLINE 0 0 0 errors: No known data errors so now I have only one disk in my pool... Now, the c1t2d0 disk is a 72fb SAS drive. I am trying to replace it with SAN 100GB LUN (emcpower0a) bash-3.00# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c1t0d0 SUN72G cyl 14087 alt 2 hd 24 sec 424 /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 1. c1t1d0 SUN72G cyl 14087 alt 2 hd 24 sec 424 /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 2. c1t2d0 SEAGATE-ST973401LSUN72G-0556-68.37GB /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 3. c1t3d0 FUJITSU-MAY2073RCSUN72G-0501-68.37GB /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 4. c2t5006016041E035A4d0 DGC-RAID5-0324 cyl 51198 alt 2 hd 256 sec 16 /[EMAIL PROTECTED],70/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0/[EMAIL PROTECTED],0 5. c2t5006016941E035A4d0 DGC-RAID5-0324 cyl 51198 alt 2 hd 256 sec 16 /[EMAIL PROTECTED],70/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0/[EMAIL PROTECTED],0 6. c3t5006016841E035A4d0 DGC-RAID5-0324 cyl 51198 alt 2 hd 256 sec 16 /[EMAIL PROTECTED],70/[EMAIL PROTECTED],2/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0/[EMAIL PROTECTED],0 7. c3t5006016141E035A4d0 DGC-RAID5-0324 cyl 51198 alt 2 hd 256 sec 16 /[EMAIL PROTECTED],70/[EMAIL PROTECTED],2/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0/[EMAIL PROTECTED],0 8. emcpower0a DGC-RAID5-0324 cyl 51198 alt 2 hd 256 sec 16 /pseudo/[EMAIL PROTECTED] Specify disk (enter its number): ^D so I do run replace command and I get and error: bash-3.00# zpool replace mypool c1t2d0 emcpower0a cannot replace c1t2d0 with emcpower0a: device is too small Any idea what I am doing wrong? Why it thinks that emcpower0a is too small? Regards, Chris On Thu, 31 May 2007, Richard Elling wrote: Krzys wrote: Sorry to bother you but something is not clear to me regarding this process.. Ok, lets sat I have two internal disks (73gb each) and I am mirror them... now I want to replace those two mirrored disks into one LUN that is on SAN and it is around 100gb. Now I do meet one requirement of having more than 73gb of storage but do I need only something like 73gb at minimum or do I actually need two luns of 73gb or more since I have it mirrored? You can attach any number of devices to a mirror. You can detach all but one of the devices from a mirror. Obviously, when the number is one, you don't currently have a mirror. The resulting logical size will be equivalent to the smallest device. My goal is simple to move data of two mirrored disks into one single SAN device... Any ideas if what I am planning to do is duable? or do I need to use zfs send and receive and just update everything and switch when I am done? or do I just add this SAN disk to the existing pool and then remove mirror somehow? I would just have to make sure that all data is off that disk... is there any option to evacuate data off that mirror? The ZFS terminology is attach and detach A replace is an attach followed by detach. It is a good idea to verify that the sync has completed before detaching. zpool status will show the current status. -- richard !DSPAM:122,465f396b235932151120594! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on CF/SSDs [was: ZFS - Use h/w raid or not?Thoughts.Considerations.]
On May 31, 2007 1:59:04 PM -0700 Richard Elling [EMAIL PROTECTED] wrote: CF cards aren't generally very fast, so the solid state disk vendors are putting them into hard disk form factors with SAS/SATA interfaces. These If CF cards aren't fast, how will putting them into a different form factor make them faster? -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs migration
On 5/31/07, Krzys [EMAIL PROTECTED] wrote: so I do run replace command and I get and error: bash-3.00# zpool replace mypool c1t2d0 emcpower0a cannot replace c1t2d0 with emcpower0a: device is too small Try zpool attach mypool emcpower0a; see http://docs.sun.com/app/docs/doc/819-5461/6n7ht6qrt?a=view . Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on CF/SSDs [was: ZFS - Use h/w raid or not?Thoughts.Considerations.]
Frank Cusack wrote: On May 31, 2007 1:59:04 PM -0700 Richard Elling [EMAIL PROTECTED] wrote: CF cards aren't generally very fast, so the solid state disk vendors are putting them into hard disk form factors with SAS/SATA interfaces. These If CF cards aren't fast, how will putting them into a different form factor make them faster? Well, if I were doing that I'd use DRAM and provide enough on-board capacitance and a small processor to copy the contents of the DRAM to flash on power failure. - Bart -- Bart Smaalders Solaris Kernel Performance [EMAIL PROTECTED] http://blogs.sun.com/barts ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs boot error recovery
On 5/31/07, Jakob Praher [EMAIL PROTECTED] wrote: c) si 3224 related question: is it possible to simply hot swap the disk (i have the disks in special hot-swappable units, but have no experience in hotswapping under solaris, such that i want to have some echo). As it happens, I just happen to have tried this - albeit on a different card, it went well. I have a Marvell 88SX6081 controller, and removing a disk caused no undue panic (as far as I can tell). Adding a new disk, the kernel detected it immediately and then I had to run cfgadm -cconfigure scsi0/1 or something like that. Then it Just Worked. I don't know if this is recommended or not... but it worked for me. Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: zfs boot error recovery
Jakob Praher schrieb: hi all, i would like to ask some questions regarding best practices for zfs recovery if disk errors occur. currently i have zfs boot (nv62) and the following setup: 2 si3224 controllers (each 4 sata disks) 8 sata disks, same size, same type i have two pools: a) rootpool b) datapool the rootpool is a mirrored pool, where every disk has a slice (the s0, which is 5 % of the whole disk) and this is devoted to the rootpool, just for mirroring. the rest of the disk (s1) is added to the datapool which is raidz. my idea is that if any disk is corrupt i am still be able to boot. now I have some questions: a) if i want to boot from every disk in case of error, i have to setup grub on every disk, such that if the controller sets this disk as the booting, the rootpool is able to be loaded from that. b) what is the best way to as fast as possible replace a disk. adding a disk as hotspare for the raidz is a good idea. but i also would like to replace the disk during runtime as simple as possible. the problem is that for the root pool the disks are labeled (the slices thingy). So I cannot simply detach the volumes and replace the disk and attach them again, but I have to format the disk such that the slicing exists. Is there some clever way to automatically re-label a replacement disk? i found out that storing or getting the label information from another disk should work: prtvtoc /dev/rdsk/s2 | fmthard -s - /dev/rdsk/s2 for instance i could simply store the label of all disks on the root pool, which should be available as long as any of the 8 disks is still availabe. So in case of repair i simply have to fmthard -s disknumber before attaching the replaced disk. c) si 3224 related question: is it possible to simply hot swap the disk (i have the disks in special hot-swappable units, but have no experience in hotswapping under solaris, such that i want to have some echo). d) do you have best practices for systems like that above? what are the best resources on the web for learning about monitoring the health of the zfs system (like email notifications in case of disk failures...) thannks in advance -- Jakob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss