Re: [zfs-discuss] Trying to understand zfs RAID-Z
If I understand correctly, then the parity block for RAID-Z are also written in two different atomic operations. As per RAID-5. (the only difference being each can be of a different stripe size). HL As with Raid-5 on a four disk stripe, there are four independant HL writes, and they don't need to be atomic, as Copy-on-Write implies HL that the new blocks are written elsewhere on disk, while maintaining HL the original data. Only after all four writes return and are flushed HL to disk can you proceed and update the metadata. And to clear things - meta data are updated also in a spirit of COW - so metadata are written to new locations and then uber block is atomically updated pointing to new meta data Well, to add to this, uber-blocks are also updated in COW fashion - there is a circular array of 128 uber-blocks, and new uber-block is written to the next to current slot. victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Reading a ZFS Snapshot
mnh wrote: Hi, I was wondering if there is any way to read a ZFS snapshot using system/zfs lib (ie refer to it as a block device). I dug through the libzfs source but could not find anything that could enable me to 'read' the contents of a snapshot/filesystem. Why ? What problem are you trying to solve ? Given that you can't read the filesystem as a block device in the first place why would it make sense to do so for a snapshot of filesystem ? -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Trying to understand zfs RAID-Z
HL And to clear things - meta data are updated also in a spirit of COW - HL so metadata are written to new locations and then uber block is HL atomically updated pointing to new meta data Victor Latushkin wrote: Well, to add to this, uber-blocks are also updated in COW fashion - there is a circular array of 128 uber-blocks, and new uber-block is written to the next to current slot. Correct, I left it out because there's more detail involved with the uberblock. We can deal with it when we get there. Cheers, Henk ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 3320 JBOD setup
See inline near then end... Tomas Ögren wrote: On 14 May, 2007 - Dale Sears sent me these 0,9K bytes: I was wondering if this was a good setup for a 3320 single-bus, single-host attached JBOD. There are 12 146G disks in this array: I used: zpool create pool1 \ raidz2 c2t0d0 c2t1d0 c2t2d0 c2t3d0 c2t4d0 c2t5d0 c2t6d0 c2t8d0 c2t9d0 c2t10 \ spare c2t11d0 c2t12d0 (or something very similar) This yields a 1TB file system with dual parity and two spare disks. So first any two disks can fail at the same time.. then after rebuilding, two more disks can fail.. until you've replaced a disk.. The cu is happy, but I wonder if there are any other suggestions for making this array faster or more reliable or just better in your opinion. I know that better has different meanings under different application conditions, so I'm just looking for folks to recommend a setup and perhaps explain why they would do it that way. That raid set will give you the same random IO performance as a single disk. Sequential IO will be better than a single disk. For instance splitting it into two raidz2 disks without spares can survive any two disks within both groups (so 2 to 4 disks can fail without data loss).. Random IO performance will be twice the single raidz2/single disk. What would that command look like? Is this what you're saying?: zpool create pool1 \ raidz2 c2t0d0 c2t1d0 c2t2d0 c2t3d0 c2t4d0 c2t5d0 \ raidz2 c2t6d0 c2t8d0 c2t9d0 c2t10d0 c2t11d0 c2t12d0 Thanks! Dale /Tomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Reading a ZFS Snapshot
Darren J Moffat wrote: mnh wrote: Hi, I was wondering if there is any way to read a ZFS snapshot using system/zfs lib (ie refer to it as a block device). I dug through the libzfs source but could not find anything that could enable me to 'read' the contents of a snapshot/filesystem. Why ? What problem are you trying to solve ? We are trying to implement a third-party Backup/Restore System for zfs (including bare metal recovery). Essentially requires the snapshot to be read and stored in a proprietary format. Given that you can't read the filesystem as a block device in the first place why would it make sense to do so for a snapshot of filesystem ? I know it doesn't make much sense, I was just hoping that zfs's snapshots could be used by a different product/vendor. Thanks, mnh ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Reading a ZFS Snapshot
mnh wrote: Darren J Moffat wrote: mnh wrote: Hi, I was wondering if there is any way to read a ZFS snapshot using system/zfs lib (ie refer to it as a block device). I dug through the libzfs source but could not find anything that could enable me to 'read' the contents of a snapshot/filesystem. Why ? What problem are you trying to solve ? We are trying to implement a third-party Backup/Restore System for zfs (including bare metal recovery). Essentially requires the snapshot to be read and stored in a proprietary format. Is there a reason why you can't just walk through the snapshot using POSIX APIs ? The snapshot is mounted in rootofdataset/.zfs/snapshot/nameofsnapshot Or maybe zfs send/recv is what you need. Given that you can't read the filesystem as a block device in the first place why would it make sense to do so for a snapshot of filesystem ? I know it doesn't make much sense, I was just hoping that zfs's snapshots could be used by a different product/vendor. Sure they can but I'm not sure you are approaching the problem from a view that ZFS can give you on the data. It might help if you described how this backup software works for other filesystems, eg UFS. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Reading a ZFS Snapshot
Darren J Moffat wrote: Is there a reason why you can't just walk through the snapshot using POSIX APIs ? The snapshot is mounted in rootofdataset/.zfs/snapshot/nameofsnapshot We cannot walk through the mounted snapshot as it's not just the data that we are concerned about. We need to read the complete snapshot (data+metadata). Or maybe zfs send/recv is what you need. I looked at zfs send/recv and was pleasantly surprised by its capabilities. Unfortunately it does not fit into our backup agent's design (NDMP based). The agent reads directly of a snapshot - using zfs send would require additional space to create the backup file. (and we cannot do a zfs send to the destination, even though it would have been nice :) ). Given that you can't read the filesystem as a block device in the first place why would it make sense to do so for a snapshot of filesystem ? I know it doesn't make much sense, I was just hoping that zfs's snapshots could be used by a different product/vendor. Sure they can but I'm not sure you are approaching the problem from a view that ZFS can give you on the data. It might help if you described how this backup software works for other filesystems, eg UFS. For UFS (as with other filesystems eg. NTFS) - we use the filesystem's local snapshot mechanism (fssnap/vss). In all cases you can refer to the snapshot as a block device. Thanks, mnh ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 3320 JBOD setup
On 18 May, 2007 - Dale Sears sent me these 1,5K bytes: Tomas Ögren wrote: On 14 May, 2007 - Dale Sears sent me these 0,9K bytes: I was wondering if this was a good setup for a 3320 single-bus, single-host attached JBOD. There are 12 146G disks in this array: I used: zpool create pool1 \ raidz2 c2t0d0 c2t1d0 c2t2d0 c2t3d0 c2t4d0 c2t5d0 c2t6d0 c2t8d0 c2t9d0 c2t10 \ spare c2t11d0 c2t12d0 [..] That raid set will give you the same random IO performance as a single disk. Sequential IO will be better than a single disk. For instance splitting it into two raidz2 disks without spares can survive any two disks within both groups (so 2 to 4 disks can fail without data loss).. Random IO performance will be twice the single raidz2/single disk. What would that command look like? Is this what you're saying?: zpool create pool1 \ raidz2 c2t0d0 c2t1d0 c2t2d0 c2t3d0 c2t4d0 c2t5d0 \ raidz2 c2t6d0 c2t8d0 c2t9d0 c2t10d0 c2t11d0 c2t12d0 Thanks! Yep. Verify performance differences in your usage case between the two methods.. Its reliability against failures is a bit more of a gamble than a big one with 2HS.. If you're lucky, 4 disks can blow up at the same time without problems (vs 2 in your version).. If you're unlucky, 2 disks from the same set blows up and then another one before you had the chance to replace them with cold spare(s).. If first 2 then another one during a weekend or so.. A hot spare could have saved you then.. If you have a cold spare laying around and replacing as soon as one break, this shouldn't be a problem.. but it can make a difference, it's up to you to decide (or attach a single additional hotspare outside the 3320). /Tomas -- Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Reading a ZFS Snapshot
I think it would be handy if a utility could read a full zfs snapshot and restore subsets of files or directories like using something like tar -xf or ufsrestore -i. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] DBMS on zpool
Hi Just playing around with zfs , trying to locate DBMS data files to zpool. DBMS i mean here are oracle and informix. currently noticed that read operations perfomance is excelent but all write operations are not and also write operations performance variates a lot. My quess for not so good write performance and write performance variation is double buffering , DBMS buffers and zfs caching. together. Have anyone seen or tested best practices how should DBMS setup be implemented using zpool ; zfs or zvol. Thanks This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Trying to understand zfs RAID-Z
Quoth Steven Sim on Thu, May 17, 2007 at 09:55:37AM +0800: Gurus; I am exceedingly impressed by the ZFS although it is my humble opinion that Sun is not doing enough evangelizing for it. What else do you think we should be doing? David ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Reading a ZFS Snapshot
I'm not sure what you want that the file system does not already provide. you can use cp to copy files out, or find(1) to find them based on time or any other attribute and then cpio to copy them out. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: [zfs-discuss] DBMS on zpool
This is probably a good place to start. http://blogs.sun.com/realneel/entry/zfs_and_databases Please post back to the group with your results, I'm sure many of us are interested. Thanks, -- MikeE -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of homerun Sent: Friday, May 18, 2007 8:42 AM To: zfs-discuss@opensolaris.org Subject: [zfs-discuss] DBMS on zpool Hi Just playing around with zfs , trying to locate DBMS data files to zpool. DBMS i mean here are oracle and informix. currently noticed that read operations perfomance is excelent but all write operations are not and also write operations performance variates a lot. My quess for not so good write performance and write performance variation is double buffering , DBMS buffers and zfs caching. together. Have anyone seen or tested best practices how should DBMS setup be implemented using zpool ; zfs or zvol. Thanks This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Reading a ZFS Snapshot
An example would be if you had a raw snapshot on tape. A single file or subset of files could be restored from it without needing the space to load the full snapshot into a zpool. This would be handy if you have a zpool with 500GB of space and 300GB used. If you had a snapshot that was 250GB and wanted to load it back up to restore a file, you wouldn't have sufficient space. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] AVS replication vs ZFS send recieve for odd sized volume pairs
Hello all, I am interested in setting up an HA NFS server with zfs as the storage filesystem on Solaris 10 + Sun Cluster 3.2. This is an HPC environment with a 70 node cluster attached. File sizes are 1-200meg or so, with an average around 10meg. I have two servers, and due to changing specs through time I have ended up with heterogeneus storage. They are physically close to each other, so no offsite replication needs. Server A has an areca 12 port raid card attached to 12x400 gig drives. Server B has an onboard raid with 6 available slots which I plan on populating with either 750 gig or 1tb drives. With AVS 4.0 (which I have running on a test volume pair) I am able to mirror the zpools at the block level, but I am forced to have an equal number of LUNs for it to work on( AVS mirrors block devices that zfs works on top of). If I carve up each raid set into 4 volumes, AVS those(plus bitmap volumes) and then ZFS stripe over that, theoretically I am in business, although this has a couple of downsides. If I want to maximize my performance first, while keeping a margin of safety in this replicated environment, how can I best use my storage? Option one: AVS + Hardware raid 5 on each side. Make 4 LUNs and zfs stripe on top. Hardware raid takes care of drive failure. AVS ensures that the whole storage pool is replicated at all times to Server B. This method does not take advantage of disk caching zfs can do, nor additional performance scheduling zfs would like to manage at the drive level. Also unknown is how the SC3.2 HA ZFS module will work on an AVS zfs filesystem as I believe it was designed for a fiberchannel shared set of disks. On the plus side with this method we have block level replication, so close to instantaneous sync between filesystems. Option two: Full zfs pools on both side using zfs send+zfs recieve for the replication. This has benifits because my pools can be different sized and grow and thats ok. Could also be mounted on server B as well (most of the time). Downside is I have to hack a zfs send + recieve script+cron job, which is not likely as bombproof as the tried and tested AVS? So... basically, how are you all doing replication between two different disk topologies using zfs? I am a solaris newbie, attracted by the smell of the zfs, and so pardon my lack of in depth knowledge into these issues. Thank you in advance. Ahab ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Reading a ZFS Snapshot
On 18-May-07, at 1:57 PM, William D. Hathaway wrote: An example would be if you had a raw snapshot on tape. Unless I misunderstand ZFS, you can archive the contents of a snapshot, but there's no concept of a 'raw snapshot' divorced from a filesystem. A single file or subset of files could be restored from it without needing the space to load the full snapshot into a zpool. This would be handy if you have a zpool with 500GB of space and 300GB used. If you had a snapshot that was 250GB and wanted to load it back up to restore a file, you wouldn't have sufficient space. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Trying to understand zfs RAID-Z
David Bustos wrote: Quoth Steven Sim on Thu, May 17, 2007 at 09:55:37AM +0800: Gurus; I am exceedingly impressed by the ZFS although it is my humble opinion that Sun is not doing enough evangelizing for it. What else do you think we should be doing? Send Thumpers to every respectable journal for a review! That's probably a problem for marketing, how to target the the publications the people with the check books read to broaden the awareness of ZFS. Just about every x86 server manufacturer provides and promotes the features of hardware RAID solutions, maybe Sun should make more of the cost savings in storage ZFS offers to gain a cost advantage over the competition, or even save $ on HP servers by running Solaris an removing the RAID. How about some JBOD only storage products? Or at least make hardware RAID a add on an option, to cater for a broader market. Trying to break (especially windows) administrators and CIOs out of the hardware RAID is best or even hardware RAID is essential mindset is a tough ask. As hardware RAID drops in price and moves into consumer grade products, ZFS will loose the cost advantage (just try and get a JBOD only SATA card, I only know of one). Ian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Making 'zfs destroy' safer
Hello, with the advent of clones and snapshots, one will of course start creating them. Which also means destroying them. Am I the only one who is *extremely* nervous about doing zfs destroy some/[EMAIL PROTECTED]? This goes bot manually and automatically in a script. I am very paranoid about this; especially because the @ sign might conceivably be incorrectly interpreted by some layer of scripting, being a non-alphanumeric character and highly atypical for filenames/paths. What about having dedicated commands destroysnapshot, destroyclone, or remove (less dangerous variant of destroy) that will never do anything but remove snapshots or clones? Alternatively having something along the lines of zfs destroy --nofs or zfs destroy --safe. I realize this is borderline being in the same territory as special casing rm -rf / and similar, which is generally not considered a good idea. But somehow the snapshot situation feels a lot more risky. -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller [EMAIL PROTECTED]' Key retrieval: Send an E-Mail to [EMAIL PROTECTED] E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org signature.asc Description: OpenPGP digital signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Making 'zfs destroy' safer
What about having dedicated commands destroysnapshot, destroyclone, or remove (less dangerous variant of destroy) that will never do anything but remove snapshots or clones? Alternatively having something along the lines of zfs destroy --nofs or zfs destroy --safe. Another option is to allow something along the lines of: zfs destroy snapshot:/path/to/[EMAIL PROTECTED] Where the use of snapshot: would guarantee that non-snapshots are not affected. -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller [EMAIL PROTECTED]' Key retrieval: Send an E-Mail to [EMAIL PROTECTED] E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org signature.asc Description: OpenPGP digital signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Trying to understand zfs RAID-Z
On 18-May-07, at 4:39 PM, Ian Collins wrote: David Bustos wrote: ... maybe Sun should make more of the cost savings in storage ZFS offers to gain a cost advantage over the competition, Cheaper AND more robust+featureful is hard to beat. --T ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] DBMS on zpool
homerun wrote: Hi Just playing around with zfs , trying to locate DBMS data files to zpool. DBMS i mean here are oracle and informix. currently noticed that read operations perfomance is excelent but all write operations are not and also write operations performance variates a lot. My quess for not so good write performance and write performance variation is double buffering , DBMS buffers and zfs caching. together. Have anyone seen or tested best practices how should DBMS setup be implemented using zpool ; zfs or zvol. Neel has spent some time on this topic. I'd start with his blog. http://blogs.sun.com/realneel Additional blogs to check are Roch's and Bob Sneed, for discussions on caching and direct I/O. http://blogs.sun.com/roch http://blogs.sun.com/bobs We've been trying to collect the wisdom onto one site, but it is getting a little crowded and therefore tends to be terse. The blogs explain the concepts in more detail, and more conversationally. http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
queuing theory should explain this rather nicely. iostat measures %busy by counting if there is an entry in the queue for the clock ticks. There are two queues, one in the controller and one on the disk. As you can clearly see the way ZFS pushes the load is very different than dd or UFS. -- richard Marko Milisavljevic wrote: I am very grateful to everyone who took the time to run a few tests to help me figure what is going on. As per j's suggestions, I tried some simultaneous reads, and a few other things, and I am getting interesting and confusing results. All tests are done using two Seagate 320G drives on sil3114. In each test I am using dd if= of=/dev/null bs=128k count=1. Each drive is freshly formatted with one 2G file copied to it. That way dd from raw disk and from file are using roughly same area of disk. I tried using raw, zfs and ufs, single drives and two simultaneously (just executing dd commands in separate terminal windows). These are snapshots of iostat -xnczpm 3 captured somewhere in the middle of the operation. I am not bothering to report CPU% as it never rose over 50%, and was uniformly proportional to reported throughput. single drive raw: r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1378.40.0 77190.70.0 0.0 1.70.01.2 0 98 c0d1 single drive, ufs file r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1255.10.0 69949.60.0 0.0 1.80.01.4 0 100 c0d0 Small slowdown, but pretty good. single drive, zfs file r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 258.3 0.0 33066.60.0 33.0 2.0 127.77.7 100 100 c0d1 Now that is odd. Why so much waiting? Also, unlike with raw or UFS, kr/s / r/s gives 256K, as I would imagine it should. simultaneous raw: r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 797.00.0 44632.00.0 0.0 1.80.02.3 0 100 c0d0 795.70.0 44557.40.0 0.0 1.80.02.3 0 100 c0d1 This PCI interface seems to be saturated at 90MB/s. Adequate if the goal is to serve files on gigabit SOHO network. sumultaneous raw on c0d1 and ufs on c0d0: extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 722.40.0 40246.80.0 0.0 1.80.02.5 0 100 c0d0 717.10.0 40156.20.0 0.0 1.80.02.5 0 99 c0d1 hmm, can no longer get the 90MB/sec. simultaneous zfs on c0d1 and raw on c0d0: r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.70.01.8 0.0 0.00.00.1 0 0 c1d0 334.90.0 18756.00.0 0.0 1.90.05.5 0 97 c0d0 172.50.0 22074.60.0 33.0 2.0 191.3 11.6 100 100 c0d1 Everything is slow. What happens if we throw onboard IDE interface into the mix? simultaneous raw SATA and raw PATA: r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1036.30.3 58033.90.3 0.0 1.60.01.6 0 99 c1d0 1422.60.0 79668.30.0 0.0 1.60.01.1 1 98 c0d0 Both at maximum throughput. Read ZFS on SATA drive and raw disk on PATA interface: r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1018.90.3 57056.14.0 0.0 1.70.01.7 0 99 c1d0 268.40.0 34353.1 0.0 33.0 2.0 122.97.5 100 100 c0d0 SATA is slower with ZFS as expected by now, but ATA remains at full speed. So they are operating quite independantly. Except... What if we read a UFS file from the PATA disk and ZFS from SATA: r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 792.80.0 44092.90.0 0.0 1.80.02.2 1 98 c1d0 224.00.0 28675.20.0 33.0 2.0 147.38.9 100 100 c0d0 Now that is confusing! Why did SATA/ZFS slow down too? I've retried this a number of times, not a fluke. Finally, after reviewing all this, I've noticed another interesting bit... whenever I read from raw disks or UFS files, SATA or PATA, kr/s over r/s is 56k, suggesting that underlying IO system is using that as some kind of a native block size? (even though dd is requesting 128k). But when reading ZFS files, this always comes to 128k, which is expected, since that is ZFS default (and same thing happens regardless of bs= in dd). On the theory that my system just doesn't like 128k reads (I'm desperate!), and that this would explain the whole slowdown and wait/wsvc_t column, I tried changing recsize to 32k and rewriting the test file. However, accessing ZFS files continues to show 128k reads, and it is just as slow. Is there a way to either confirm that the ZFS file in question is indeed written with 32k records or, even better, to force ZFS to use 56k when accessing the disk. Or perhaps I just misunderstand implications of iostat output. I've repeated each of these tests a few times and doublechecked, and the numbers, although
[zfs-discuss] Re: ZFS over a layered driver interface
I explored this a bit and found that the ldi_ioctl in my layered driver does fail, but fails because of an iappropriate ioctl for device error, which the underlying ramdisk driver's ioctl returns. So doesn't seem like that's an issue at all (since I know the storage pool creation is successful when I give the ramdisk directly as the target device). However, as I mentioned, even though reads and writes are getting invoked on the ramdisk, through my layered driver, the storage pool creation still fails. Surprisingly, the layered driver's routines show no sign of error - as in the layered device gets closed successfully when the pool creation command returns. It is unclear to be what would be a good way to go about debugging this, since I'm not familiar with dtrace- i shall try and familiarize myself with dtrace, but even then, it seems like there are a large number of functions returning non-zero values, and confusing to me where to look for the error. Any pointers would be most welcome!! Thanks, Swetha. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS - Use h/w raid or not? Thoughts. Considerations.
On 5/17/07, Robert Milkowski [EMAIL PROTECTED] wrote: Hello Phillip, Thursday, May 17, 2007, 6:30:38 PM, you wrote: PF [b]Given[/b]: A Solaris 10 u3 server with an externally attached PF disk array with RAID controller(s) PF [b]Question[/b]: Is it better to create a zpool from a PF [u]single[/u] external LUN on an external disk array, or is it PF better to use no RAID on the disk array and just present PF individual disks to the server and let ZFS take care of the RAID? Then other thing - do you use SATA disks? How much data loss or corruption is an issue for you? Doing software RAID in ZFS can detect AND correct such problems. HW RAID also can but to much less extent. I think this point needs to be emphasized. If reliability is a prime concern, you absolutely want to let ZFS handle redundancy in one way or another, either as mirrogin or as raidz. You can think of redundancy in ZFS as much the same thing as packet retransmission in TCP. If the data comes through bad the first time, checksum verification will catch it, and you get a second chance to get the correct data. A single-LUN zpool is the moral equivalent of disabling retransmission in TCP. Chad Mynhier ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: AVS replication vs ZFS send recieve for odd sized volume pairs
Yes, i am also interested in this. We can't afford two super fast setup so we are looking at having a huge pile sata to act as a real time backup for all our streams. So what can AVS do and its limitations are? Would a just using zfs send and receive do or does AVS make it all seamless? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Making 'zfs destroy' safer
with the advent of clones and snapshots, one will of course start creating them. Which also means destroying them. Am I the only one who is *extremely* nervous about doing zfs destroy some/[EMAIL PROTECTED]? This goes bot manually and automatically in a script. I am very paranoid about this; especially because the @ sign might conceivably be incorrectly interpreted by some layer of scripting, being a non-alphanumeric character and highly atypical for filenames/paths. What about having dedicated commands destroysnapshot, destroyclone, or remove (less dangerous variant of destroy) that will never do anything but remove snapshots or clones? Alternatively having something along the lines of zfs destroy --nofs or zfs destroy --safe. Apparently (and I'm not sure where this is documented), you can 'rmdir' a snapshot to remove it (in some cases). A normal (populated) directory wouldn't be removable with a single rmdir, so in some sense it's safer. Personally, I would prefer that file operations (like mv and rmdir) couldn't affect snapshots. I realize this is borderline being in the same territory as special casing rm -rf / and similar, which is generally not considered a good idea. But somehow the snapshot situation feels a lot more risky. Agreed. I'm somewhat used to the VxVM command set which requires the type of object be passed in in some cases (even though the name is necessarily unique and would be enough to define the object). vxassist -g diskgroup remove volume volumename vxassist -g diskgroup remove mirror mirrorname It doesn't feel unnatural to me to specify things this way. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Making 'zfs destroy' safer
Rather than rehash this, again, from scratch. Refer to a previous rehashing. http://www.opensolaris.org/jive/thread.jspa?messageID=15363; -- richard Peter Schuller wrote: Hello, with the advent of clones and snapshots, one will of course start creating them. Which also means destroying them. Am I the only one who is *extremely* nervous about doing zfs destroy some/[EMAIL PROTECTED]? This goes bot manually and automatically in a script. I am very paranoid about this; especially because the @ sign might conceivably be incorrectly interpreted by some layer of scripting, being a non-alphanumeric character and highly atypical for filenames/paths. What about having dedicated commands destroysnapshot, destroyclone, or remove (less dangerous variant of destroy) that will never do anything but remove snapshots or clones? Alternatively having something along the lines of zfs destroy --nofs or zfs destroy --safe. I realize this is borderline being in the same territory as special casing rm -rf / and similar, which is generally not considered a good idea. But somehow the snapshot situation feels a lot more risky. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Making 'zfs destroy' safer
Rather than rehash this, again, from scratch. Refer to a previous rehashing. http://www.opensolaris.org/jive/thread.jspa?messageID=15363; That thread really did quickly move to arguments about confirmations and their usefulness or annoyance. I think the idea presented of adding something like a filter is slightly different. It wouldn't require confirmation or modification of the existing behavior (and it wouldn't be relevant to the original issue in that other thread). destroy obj # destroys any existing obj if possible destroy snapshot obj # destroys obj only if it is a snapshot -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Making 'zfs destroy' safer
Hey, that's nothing, I had one zfs file system, then I cloned it, so I thought that I had two separate file systems. then I was making snaps of both of them. Then later on I decided I did not need original file system with its snaps. So I did recursively remove it, all of a sudden I got a message that this clone file system is mounted and cannot be removed, my heart did stop for a second as that clone was a file system that I was using. I suspect that I did not promote zfs file system to be completely stand alone so ehh, I did not have idea that was the case... but it did scare me how easy I could just loose file system by removing wrong thing. Chris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss