Re: [zfs-discuss] Dedup - Does on imply sha256?

2010-08-24 Thread Jeff Bonwick
Correct. Jeff On Aug 24, 2010, at 9:45 PM, Peter Taps wrote: Folks, One of the articles on the net says that the following two commands are exactly the same: # zfs set dedup=on tank # zfs set dedup=sha256 tank Essentially, on is just a pseudonym for sha256 and verify is just a

Re: [zfs-discuss] gang blocks at will?

2010-05-26 Thread Jeff Bonwick
You can set metaslab_gang_bang to (say) 8k to force lots of gang block allocations. Jeff On May 25, 2010, at 11:42 PM, Andriy Gapon wrote: I am working on improving some ZFS-related bits in FreeBSD boot chain. At the moment it seems that the things work mostly fine except for a case

Re: [zfs-discuss] Pool import with failed ZIL device now possible ?

2010-02-16 Thread Jeff Bonwick
People used fastfs for years in specific environments (hopefully understanding the risks), and disabling the ZIL is safer than fastfs. Seems like it would be a useful ZFS dataset parameter. We agree. There's an open RFE for this: 6280630 zil synchronicity No promise on date, but it will

Re: [zfs-discuss] compressratio vs. dedupratio

2009-12-13 Thread Jeff Bonwick
It is by design. The idea is to report the dedup ratio for the data you've actually attempted to dedup. To get a 'diluted' dedup ratio of the sort you describe, just compare the space used by all datasets to the space allocated in the pool. For example, on my desktop, I have a pool called

Re: [zfs-discuss] Doing ZFS rollback with preserving later created clones/snapshot?

2009-12-11 Thread Jeff Bonwick
Yes, although it's slightly indirect: - make a clone of the snapshot you want to roll back to - promote the clone See 'zfs promote' for details. Jeff On Fri, Dec 11, 2009 at 08:37:04AM +0100, Alexander Skwar wrote: Hi. Is it possible on Solaris 10 5/09, to rollback to a ZFS

Re: [zfs-discuss] Deduplication - deleting the original

2009-12-08 Thread Jeff Bonwick
i am no pro in zfs, but to my understanding there is no original. That is correct. From a semantic perspective, there is no change in behavior between dedup=off and dedup=on. Even the accounting remains the same: each reference to a block is charged to the dataset making the reference. The

Re: [zfs-discuss] heads-up: dedup=fletcher4,verify was broken

2009-11-23 Thread Jeff Bonwick
And, for the record, this is my fault. There is an aspect of endianness that I simply hadn't thought of. When I have a little more time I will blog about the whole thing, because there are many useful lessons here. Thank you, Matt, for all your help with this. And my apologies to everyone else

Re: [zfs-discuss] heads-up: dedup=fletcher4,verify was broken

2009-11-23 Thread Jeff Bonwick
. Jeff On Mon, Nov 23, 2009 at 09:44:41PM -0800, Jeff Bonwick wrote: And, for the record, this is my fault. There is an aspect of endianness that I simply hadn't thought of. When I have a little more time I will blog about the whole thing, because there are many useful lessons here. Thank you

Re: [zfs-discuss] dedupe is in

2009-11-02 Thread Jeff Bonwick
Terrific! Can't wait to read the man pages / blogs about how to use it... Just posted one: http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup Enjoy, and let me know if you have any questions or suggestions for follow-on posts. Jeff ___ zfs-discuss

Re: [zfs-discuss] Apple cans ZFS project

2009-10-24 Thread Jeff Bonwick
Apple can currently just take the ZFS CDDL code and incorporate it (like they did with DTrace), but it may be that they wanted a private license from Sun (with appropriate technical support and indemnification), and the two entities couldn't come to mutually agreeable terms. I cannot

Re: [zfs-discuss] Replacing a failed drive

2009-06-19 Thread Jeff Bonwick
Yep, you got it. Jeff On Fri, Jun 19, 2009 at 04:15:41PM -0700, Simon Breden wrote: Hi, I have a ZFS storage pool consisting of a single RAIDZ2 vdev of 6 drives, and I have a question about replacing a failed drive, should it occur in future. If a drive fails in this double-parity vdev,

Re: [zfs-discuss] Mobo SATA migration to AOC-SAT2-MV8 SATA card

2009-06-19 Thread Jeff Bonwick
Yep, right again. Jeff On Fri, Jun 19, 2009 at 04:21:42PM -0700, Simon Breden wrote: Hi, I'm using 6 SATA ports from the motherboard but I've now run out of SATA ports, and so I'm thinking of adding a Supermicro AOC-SAT2-MV8 8-port SATA controller card. What is the procedure for

Re: [zfs-discuss] Resilver Performance and Behavior

2009-05-03 Thread Jeff Bonwick
According to the ZFS documentation, a resilver operation includes what is effectively a dirty region log (DRL) so that if the resilver is interrupted, by a snapshot or reboot, the resilver can continue where it left off. That is not the case. The dirty region log keeps track of

Re: [zfs-discuss] Peculiarities of COW over COW?

2009-04-27 Thread Jeff Bonwick
ZFS blocksize is dynamic, power of 2, with a max size == recordsize. Minor clarification: recordsize is restricted to powers of 2, but blocksize is not -- it can be any multiple of sector size (512 bytes). For small files, this matters: a 37k file is stored in a 37k block. For larger,

Re: [zfs-discuss] Data size grew.. with compression on

2009-04-08 Thread Jeff Bonwick
Yes, I made note of that in my OP on this thread. But is it enough to end up with 8gb of non-compressed files measuring 8gb on reiserfs(linux) and the same data showing nearly 9gb when copied to a zfs filesystem with compression on. whoops.. a hefty exaggeration it only shows about

Re: [zfs-discuss] Data size grew.. with compression on

2009-03-30 Thread Jeff Bonwick
Right. Another difference to be aware of is that ZFS reports the total space consumed, including space for metadata -- typically around 1%. Traditional filesystems like ufs and ext2 preallocate metadata and don't count it as using space. I don't know how reiserfs does its bookkeeping, but I

Re: [zfs-discuss] RFE: creating multiple clones in one zfs(1) call and one txg

2009-03-29 Thread Jeff Bonwick
I agree with Chris -- I'd much rather do something like: zfs clone snap1 clone1 snap2 clone2 snap3 clone3 ... than introduce a pattern grammar. Supporting multiple snap/clone pairs on the command line allows you to do just about anything atomically. Jeff On Fri, Mar 27, 2009 at

Re: [zfs-discuss] Forensics related ZFS questions

2009-03-16 Thread Jeff Bonwick
1. Does variable FSB block sizing extend to files larger than record size, concerning the last FSB allocated? In other words, for files larger than 128KB, that utilize more than one full recordsize FSB, will the LAST FSB allocated be `right-sized' to fit the remaining data,

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Jeff Bonwick
I'm rather tired of hearing this mantra. [...] Every file system needs a repair utility Hey, wait a minute -- that's a mantra too! I don't think there's actually any substantive disagreement here -- stating that one doesn't need a separate program called /usr/sbin/fsck is not the same as

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Jeff Bonwick
This is CR 6667683 http://bugs.opensolaris.org/view_bug.do?bug_id=6667683 I think that would solve 99% of ZFS corruption problems! Based on the reports I've seen to date, I think you're right. Is there any EDT for this patch? Well, because of this thread, this has gone from on my list

Re: [zfs-discuss] Does your device honor write barriers?

2009-02-10 Thread Jeff Bonwick
wellif you want a write barrier, you can issue a flush-cache and wait for a reply before releasing writes behind the barrier. You will get what you want by doing this for certain. Not if the disk drive just *ignores* barrier and flush-cache commands and returns success. Some consumer

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-09 Thread Jeff Bonwick
There is no substitute for cord-yank tests - many and often. The weird part is, the ZFS design team simulated millions of them. So the full explanation remains to be uncovered? We simulated power failure; we did not simulate disks that simply blow off write ordering. Any disk that you'd

Re: [zfs-discuss] snapshot identity

2009-02-03 Thread Jeff Bonwick
The Validated Execution project is investigating how to utilize ZFS snapshots as the basis of a validated filesystem. Given that the blocks of the dataset form a Merkel tree of hashes, it seemed straightforward to validate the individual objects in the snapshot and then sign the hash of the

Re: [zfs-discuss] ZFS core contributor nominations

2009-02-02 Thread Jeff Bonwick
I would like to nominate roch.bourbonn...@sun.com for his work on improving the performance of ZFS over the last few years. Absolutely. Jeff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org

Re: [zfs-discuss] Where does set the value to zio-io_offset?

2009-01-24 Thread Jeff Bonwick
Each ZFS block pointer contains up to three DVAs (data virtual addresses), to implement 'ditto blocks' (multiple copies of the data, above and beyond any replication provided by mirroring or RAID-Z). Semantically, ditto blocks are a lot like mirrors, so we actually use the mirror code to read

Re: [zfs-discuss] Split responsibility for data with ZFS

2008-12-13 Thread Jeff Bonwick
Off the top of my head nearly all of them. Some of them have artificial limitations because they learned the hard way that if you give customers enough rope they'll hang themselves. For instance unlimited snapshots. Oh, that's precious! It's not an arbitrary limit, it's a safety feafure!

Re: [zfs-discuss] zpol mirror creation after non-mirrored zpool is setup

2008-12-13 Thread Jeff Bonwick
On Sat, Dec 13, 2008 at 04:44:10PM -0800, Mark Dornfeld wrote: I have installed Solaris 10 on a ZFS filesystem that is not mirrored. Since I have an identical disk in the machine, I'd like to add that disk to the existing pool as a mirror. Can this be done, and if so, how do I do it? Yes: #

Re: [zfs-discuss] Split responsibility for data with ZFS

2008-12-12 Thread Jeff Bonwick
I'm going to pitch in here as devil's advocate and say this is hardly revolution. 99% of what zfs is attempting to do is something NetApp and WAFL have been doing for 15 years+. Regardless of the merits of their patents and prior art, etc., this is not something revolutionarily new. It may

Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression

2008-11-29 Thread Jeff Bonwick
If you have more comments, or especially if you think I reached the wrong conclusion, please do post it. I will post my continuing results. I think your conclusions are correct. The main thing you're seeing is the combination of gzip-9 being incredibly CPU-intensive with our I/O pipeline

Re: [zfs-discuss] ZFS, Smashing Baby a fake???

2008-11-25 Thread Jeff Bonwick
I think we (the ZFS team) all generally agree with you. The current nevada code is much better at handling device failures than it was just a few months ago. And there are additional changes that were made for the FishWorks (a.k.a. Amber Road, a.k.a. Sun Storage 7000) product line that will make

Re: [zfs-discuss] Lost Disk Space

2008-11-02 Thread Jeff Bonwick
Are you running this on a live pool? If so, zdb can't get a reliable block count -- and zdb -L [live pool] emits a warning to that effect. Jeff On Thu, Oct 16, 2008 at 03:36:25AM -0700, Ben Rockwood wrote: I've been struggling to fully understand why disk space seems to vanish. I've dug

Re: [zfs-discuss] questions about replacing a raidz2 vdev disk with a larger one

2008-10-11 Thread Jeff Bonwick
ZFS will allow the replacement. The available size is, however, be determined by the smallest of the lot. Once you've replaced *all* 500GB disks with 1TB disks, the available space will double. One suggestion: replace as many disks as you intend to at the same time, so that ZFS only has to do

Re: [zfs-discuss] questions about replacing a raidz2 vdev disk with a larger one

2008-10-11 Thread Jeff Bonwick
-- then Eric is right, and in fact I'd go further: in that case, replace only one at a time so you maintain the ability to survive a disk failing while you're going all this. Jeff On Sat, Oct 11, 2008 at 06:37:17PM -0700, Erik Trimble wrote: Jeff Bonwick wrote: One suggestion: replace as many disks

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-10 Thread Jeff Bonwick
The circumstances where I have lost data have been when ZFS has not handled a layer of redundancy. However, I am not terribly optimistic of the prospects of ZFS on any device that hasn't committed writes that ZFS thinks are committed. FYI, I'm working on a workaround for broken devices. As

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-10 Thread Jeff Bonwick
Or is there a way to mitigate a checksum error on non-redundant zpool? It's just like the difference between non-parity, parity, and ECC memory. Most filesystems don't have checksums (non-parity), so they don't even know when they're returning corrupt data. ZFS without any replication can

Re: [zfs-discuss] zpool file corruption

2008-09-25 Thread Jeff Bonwick
It's almost certainly the SIL3114 controller. Google SIL3114 data corruption -- it's nasty. Jeff On Thu, Sep 25, 2008 at 07:50:01AM +0200, Mikael Karlsson wrote: I have a strange problem involving changes in large file on a mirrored zpool in Open solaris snv96. We use it at storage in a

Re: [zfs-discuss] Remove log device?

2008-07-13 Thread Jeff Bonwick
You are correct, and it is indeed annoying. I hope to have this fixed by the end of the month. Jeff On Sun, Jul 13, 2008 at 10:16:55PM -0500, Mike Gerdts wrote: It seems as though there is no way to remove a log device once it is added. Is this correct? Assuming this is correct, is there

Re: [zfs-discuss] scrub never finishes

2008-07-13 Thread Jeff Bonwick
ZFS co-inventor Matt Ahrens recently fixed this: 6343667 scrub/resilver has to start over when a snapshot is taken Trust me when I tell you that solving this correctly was much harder than you might expect. Thanks again, Matt. Jeff On Sun, Jul 13, 2008 at 07:08:48PM -0700, Anil Jangity wrote:

Re: [zfs-discuss] scrub failing to initialise

2008-07-11 Thread Jeff Bonwick
If the cabling outage was transient, the disk driver would simply retry until they came back. If it's a hotplug-capable bus and the disks were flagged as missing, ZFS would by default wait until the disks came back (see zpool get failmode pool), and complete the I/O then. There would be no

Re: [zfs-discuss] is it possible to add a mirror device later?

2008-07-06 Thread Jeff Bonwick
I would just swap the physical locations of the drives, so that the second half of the mirror is in the right location to be bootable. ZFS won't mind -- it tracks the disks by content, not by pathname. Note that SATA is not hotplug-happy, so you're probably best off doing this while the box is

Re: [zfs-discuss] confusion and frustration with zpool

2008-07-06 Thread Jeff Bonwick
As a first step, 'fmdump -ev' should indicate why it's complaining about the mirror. Jeff On Sun, Jul 06, 2008 at 07:55:22AM -0700, Pete Hartman wrote: I'm doing another scrub after clearing insufficient replicas only to find that I'm back to the report of insufficient replicas, which

Re: [zfs-discuss] bug id 6343667

2008-07-05 Thread Jeff Bonwick
FYI, we are literally just days from having this fixed. Matt: after putback you really should blog about this one -- both to let people know that this long-standing bug has been fixed, and to describe your approach to it. It's a surprisingly tricky and interesting problem. Jeff On Sat, Jul 05,

Re: [zfs-discuss] Changing GUID

2008-07-02 Thread Jeff Bonwick
How difficult would it be to write some code to change the GUID of a pool? As a recreational hack, not hard at all. But I cannot recommend it in good conscience, because if the pool contains more than one disk, the GUID change cannot possibly be atomic. If you were to crash or lose power in

Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Jeff Bonwick
To be honest, it is not quite clear to me, how we might utilize dumpadm(1M) to help us to calculate/recommend size of dump device. Could you please elaborate more on this ? dumpadm(1M) -c specifies the dump content, which can be kernel, kernel plus current process, or all memory. If the dump

Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-07-01 Thread Jeff Bonwick
The problem is that size-capping is the only control we have over thrashing right now. It's not just thrashing, it's also any application that leaks memory. Without a cap, the broken application would continue plowing through memory until it had consumed every free block in the storage pool.

Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-30 Thread Jeff Bonwick
Neither swap or dump are mandatory for running Solaris. Dump is mandatory in the sense that losing crash dumps is criminal. Swap is more complex. It's certainly not mandatory. Not so long ago, swap was typically larger than physical memory. But in recent years, we've essentially moved to a

Re: [zfs-discuss] zpool with RAID-5 from intelligent storage arrays

2008-06-30 Thread Jeff Bonwick
Using ZFS to mirror two hardware RAID-5 LUNs is actually quite nice. Because the data is mirrored at the ZFS level, you get all the benefits of self-healing. Moreover, you can survive a great variety of hardware failures: three or more disks can die (one in the first array, two or more in the

Re: [zfs-discuss] ZFS Deferred Frees

2008-06-30 Thread Jeff Bonwick
When a block is freed as part of transaction group N, it can be reused in transaction group N+1. There's at most a one-txg (few-second) delay. Jeff On Mon, Jun 16, 2008 at 01:02:53PM -0400, Torrey McMahon wrote: I'm doing some simple testing of ZFS block reuse and was wondering when deferred

Re: [zfs-discuss] zfs mirror broken?

2008-06-20 Thread Jeff Bonwick
If you say 'zpool online pool disk' that should tell ZFS that the disk is healthy again and automatically kick off a resilver. Of course, that should have happened automatically. What version of ZFS / Solaris are you running? Jeff On Fri, Jun 20, 2008 at 06:01:25PM +0200, Justin Vassallo

Re: [zfs-discuss] Cannot delete errored file

2008-06-10 Thread Jeff Bonwick
That's odd -- the only way the 'rm' should fail is if it can't read the znode for that file. The znode is metadata, and is therefore stored in two distinct places using ditto blocks. So even if you had one unlucky copy that was damaged on two of your disks, you should still have another copy

Re: [zfs-discuss] [caiman-discuss] disk names?

2008-06-04 Thread Jeff Bonwick
I agree with that. format(1M) and cfgadm(1M) are, ah, not the most user-friendly tools. It would be really nice to have 'zpool disks' go out and taste all the drives to see which ones are available. We already have most of the code to do it. 'zpool import' already contains the

Re: [zfs-discuss] ZFS with raidz

2008-05-30 Thread Jeff Bonwick
Very cool! Just one comment. You said: We'll try compression level #9. gzip-9 is *really* CPU-intensive, often for little gain over gzip-1. As in, it can take 100 times longer and yield just a few percent gain. The CPU cost will limit write bandwidth to a few MB/sec per core. I'd suggest

Re: [zfs-discuss] recovering data from a dettach mirrored vdev

2008-05-07 Thread Jeff Bonwick
Yes, I think that would be useful. Something like 'zpool revive' or 'zpool undead'. It would not be completely general-purpose -- in a pool with multiple mirror devices, it could only work if all replicas were detached in the same txg -- but for the simple case of a single top-level mirror vdev,

Re: [zfs-discuss] recovering data from a dettach mirrored vdev

2008-05-04 Thread Jeff Bonwick
Oh, you're right! Well, that will simplify things! All we have to do is convince a few bits of code to ignore ub_txg == 0. I'll try a couple of things and get back to you in a few hours... Jeff On Fri, May 02, 2008 at 03:31:52AM -0700, Benjamin Brumaire wrote: Hi, while diving deeply in

Re: [zfs-discuss] lost zpool when server restarted.

2008-05-04 Thread Jeff Bonwick
It's OK that you're missing labels 2 and 3 -- there are four copies precisely so that you can afford to lose a few. Labels 2 and 3 are at the end of the disk. The fact that only they are missing makes me wonder if someone resized the LUNs. Growing them would be OK, but shrinking them would

Re: [zfs-discuss] lost zpool when server restarted.

2008-05-04 Thread Jeff Bonwick
Looking at the txg numbers, it's clear that labels on to devices that are unavailable now may be stale: Actually, they look OK. The txg values in the label indicate the last txg in which the pool configuration changed for devices in that top-level vdev (e.g. mirror or raid-z group), not the

Re: [zfs-discuss] recovering data from a dettach mirrored vdev

2008-05-04 Thread Jeff Bonwick
as the name of the missing device. Good luck, and please let us know how it goes! Jeff On Sat, May 03, 2008 at 10:48:34PM -0700, Jeff Bonwick wrote: Oh, you're right! Well, that will simplify things! All we have to do is convince a few bits of code to ignore ub_txg == 0. I'll try a couple

Re: [zfs-discuss] recovering data from a dettach mirrored vdev

2008-05-04 Thread Jeff Bonwick
, 2008 at 01:21:27AM -0700, Jeff Bonwick wrote: OK, here you go. I've successfully recovered a pool from a detached device using the attached binary. You can verify its integrity against the following MD5 hash: # md5sum labelfix ab4f33d99fdb48d9d20ee62b49f11e20 labelfix It takes just

Re: [zfs-discuss] Issue with simultaneous IO to lots of ZFS pools

2008-04-30 Thread Jeff Bonwick
Indeed, things should be simpler with fewer (generally one) pool. That said, I suspect I know the reason for the particular problem you're seeing: we currently do a bit too much vdev-level caching. Each vdev can have up to 10MB of cache. With 132 pools, even if each pool is just a single iSCSI

Re: [zfs-discuss] recovering data from a dettach mirrored vdev

2008-04-29 Thread Jeff Bonwick
If your entire pool consisted of a single mirror of two disks, A and B, and you detached B at some point in the past, you *should* be able to recover the pool as it existed when you detached B. However, I just tried that experiment on a test pool and it didn't work. I will investigate further

Re: [zfs-discuss] recovering data from a dettach mirrored vdev

2008-04-29 Thread Jeff Bonwick
Urgh. This is going to be harder than I thought -- not impossible, just hard. When we detach a disk from a mirror, we write a new label to indicate that the disk is no longer in use. As a side effect, this zeroes out all the old uberblocks. That's the bad news -- you have no uberblocks. The

Re: [zfs-discuss] Performance of one single 'cp'

2008-04-14 Thread Jeff Bonwick
No, that is definitely not expected. One thing that can hose you is having a single disk that performs really badly. I've seen disks as slow as 5 MB/sec due to vibration, bad sectors, etc. To see if you have such a disk, try my diskqual.sh script (below). On my desktop system, which has 8

Re: [zfs-discuss] zfs filesystem metadata checksum

2008-04-14 Thread Jeff Bonwick
Not at present, but it's a good RFE. Unfortunately it won't be quite as simple as just adding an ioctl to report the dnode checksum. To see why, consider a file with one level of indirection: that is, it consists of a dnode, a single indirect block, and several data blocks. The indirect block

Re: [zfs-discuss] Per filesystem scrub

2008-04-05 Thread Jeff Bonwick
Aye, or better yet -- give the scrub/resilver/snap reset issue fix very high priority. As it stands snapshots are impossible when you need to resilver and scrub (even on supposedly sun supported thumper configs). No argument. One of our top engineers is working on this as we speak. I say

Re: [zfs-discuss] Per filesystem scrub

2008-03-31 Thread Jeff Bonwick
Peter, That's a great suggestion. And as fortune would have it, we have the code to do it already. Scrubbing in ZFS is driven from the logical layer, not the physical layer. When you scrub a pool, you're really just scrubbing the pool-wide metadata, then scrubbing each filesystem. At 50,000

Re: [zfs-discuss] ZFS performance lower than expected

2008-03-26 Thread Jeff Bonwick
The disks in the SAN servers were indeed striped together with Linux LVM and exported as a single volume to ZFS. That is really going to hurt. In general, you're much better off giving ZFS access to all the individual LUNs. The intermediate LVM layer kills the concurrency that's native to

Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

2008-03-02 Thread Jeff Bonwick
Nathan: yes. Flipping each bit and recomputing the checksum is not only possible, we actually did it in early versions of the code. The problem is that it's really expensive. For a 128K block, that's a million bits, so you have to re-run the checksum a million times, on 128K of data. That's

Re: [zfs-discuss] Cause for data corruption?

2008-02-29 Thread Jeff Bonwick
I thought RAIDZ would correct data errors automatically with the parity data. Right. However, if the data is corrupted while in memory (e.g. on a PC with non-parity memory), there's nothing ZFS can do to detect that. I mean, not even theoretically. The best we could do would be to narrow the

Re: [zfs-discuss] moving zfs filesystems between disks

2008-02-27 Thread Jeff Bonwick
Yes. Just say this: # zpool replace mypool disk1 disk2 This will do all the intermediate steps you'd expect: attach disk2 as a mirror of disk1, resilver, detach disk2, and grow the pool to reflect the larger size of disk1. Jeff On Wed, Feb 27, 2008 at 04:48:59PM -0800, Bill Shannon wrote:

Re: [zfs-discuss] moving zfs filesystems between disks

2008-02-27 Thread Jeff Bonwick
the larger size of newdisk. Jeff On Wed, Feb 27, 2008 at 05:04:02PM -0800, Jeff Bonwick wrote: Yes. Just say this: # zpool replace mypool disk1 disk2 This will do all the intermediate steps you'd expect: attach disk2 as a mirror of disk1, resilver, detach disk2, and grow the pool

Re: [zfs-discuss] raidz2 resilience on 3 disks

2008-02-21 Thread Jeff Bonwick
1) If i create a raidz2 pool on some disks, start to use it, then the disks' controllers change. What will happen to my zpool? Will it be lost or is there some disk tagging which allows zfs to recognise the disks? It'll be fine. ZFS opens by path, but then checks both the devid and the

Re: [zfs-discuss] Lost intermediate snapshot; incremental backup still possible?

2008-02-12 Thread Jeff Bonwick
I think so. On your backup pool, roll back to the last snapshot that was successfully received. Then you should be able to send an incremental between that one and the present. Jeff On Thu, Feb 07, 2008 at 08:38:38AM -0800, Ian wrote: I keep my system synchronized to a USB disk from time to

Re: [zfs-discuss] Issue fixing ZFS corruption

2008-01-23 Thread Jeff Bonwick
The Silicon Image 3114 controller is known to corrupt data. Google for silicon image 3114 corruption to get a flavor. I'd suggest getting your data onto different h/w, quickly. Jeff On Wed, Jan 23, 2008 at 12:34:56PM -0800, Bertrand Sirodot wrote: Hi, I have been experiencing corruption on

Re: [zfs-discuss] Issue fixing ZFS corruption

2008-01-23 Thread Jeff Bonwick
Actually s10_72, but it's not really a fix, it's a workaround for a bug in the hardware. I don't know how effective it is. Jeff On Wed, Jan 23, 2008 at 04:54:54PM -0800, Erast Benson wrote: I believe issue been fixed in snv_72+, no? On Wed, 2008-01-23 at 16:41 -0800, Jeff Bonwick wrote

Re: [zfs-discuss] x4500 recommendations for netbackup dsu?

2007-12-20 Thread Jeff Bonwick
Yep, compression is generally a nice win for backups. The amount of compression will depend on the nature of the data. If it's all mpegs, you won't see any advantage because they're already compressed. But for just about everything else, 2-3x is typical. As for hot spares, they are indeed

Re: [zfs-discuss] ZFS Roadmap - thoughts on expanding raidz / restriping / defrag

2007-12-17 Thread Jeff Bonwick
In short, yes. The enabling technology for all of this is something we call bp rewrite -- that is, the ability to rewrite an existing block pointer (bp) to a new location. Since ZFS is COW, this would be trivial in the absence of snapshots -- just touch all the data. But because a block may

Re: [zfs-discuss] Best option for my home file server?

2007-09-26 Thread Jeff Bonwick
I would keep it simple. Let's call your 250GB disks A, B, C, D, and your 500GB disks X and Y. I'd either make them all mirrors: zpool create mypool mirror A B mirror C D mirror X Y or raidz the little ones and mirror the big ones: zpool create mypool raidz A B C D mirror X Y or, as

Re: [zfs-discuss] ZFS panic when trying to import pool

2007-09-18 Thread Jeff Bonwick
Basically, it is complaining that there aren't enough disks to read the pool metadata. This would suggest that in your 3-disk RAID-Z config, either two disks are missing, or one disk is missing *and* another disk is damaged -- due to prior failed writes, perhaps. (I know there's at least one

Re: [zfs-discuss] ZFS RAIDZ vs. RAID5.

2007-09-11 Thread Jeff Bonwick
As you can see, two independent ZFS blocks share one parity block. COW won't help you here, you would need to be sure that each ZFS transaction goes to a different (and free) RAID5 row. This is I belive the main reason why poor RAID5 wasn't used in the first place. Exactly right. RAID-Z

Re: [zfs-discuss] Mysterious corruption with raidz2 vdev

2007-07-30 Thread Jeff Bonwick
I suspect this is a bug in raidz error reporting. With a mirror, each copy either checksums correctly or it doesn't, so we know which drives gave us bad data. With RAID-Z, we have to infer which drives have damage. If the number of drives returning bad data is less than or equal to the number

Re: [zfs-discuss] ZFS raid is very slow???

2007-07-06 Thread Jeff Bonwick
A couple of questions for you: (1) What OS are you running (Solaris, BSD, MacOS X, etc)? (2) What's your config? In particular, are any of the partitions on the same disk? (3) Are you copying a few big files or lots of small ones? (4) Have you measured UFS-to-UFS and ZFS-to-ZFS

Re: [zfs-discuss] Re: zfs reports small st_size for directories?

2007-06-09 Thread Jeff Bonwick
What was the reason to make ZFS use directory sizes as the number of entries rather than the way other Unix filesystems use it? In UFS, the st_size is the size of the directory inode as though it were a file. The only reason it's like that is that UFS is sloppy and lets you cat directories --

Re: [zfs-discuss] Multiple filesystem costs? Directory sizes?

2007-05-01 Thread Jeff Bonwick
Mario, For the reasons you mentioned, having a few different filesystems (on the order of 5-10, I'd guess) can be handy. Any time you want different behavior for different types of data, multiple filesystems are the way to go. For maximum directory size, it turns out that the practical limits

Re: [zfs-discuss] ZFS stalling problem

2007-03-04 Thread Jeff Bonwick
Jesse, This isn't a stall -- it's just the natural rhythm of pushing out transaction groups. ZFS collects work (transactions) until either the transaction group is full (measured in terms of how much memory the system has), or five seconds elapse -- whichever comes first. Your data would seem

Re: [zfs-discuss] FAULTED ZFS volume even though it is mirrored

2007-03-01 Thread Jeff Bonwick
However, I logged in this morning to discover that the ZFS volume could not be read. In addition, it appears to have marked all drives, mirrors the volume itself as 'corrupted'. One possibility: I've seen this happen when a system doesn't shut down cleanly after the last change to the pool

Re: [zfs-discuss] Does running redundancy with ZFS use as much disk space as doubling drives?

2007-02-26 Thread Jeff Bonwick
On Mon, Feb 26, 2007 at 01:53:17AM -0800, Tor wrote: [...] if using redundancy on ZDF The ZFS Document Format? ;-) uses less disk space as simply getting extra drives and do identical copies, with periodic CRC checks of the source material to check the health. If you create a 2-disk mirror,

Re: [zfs-discuss] Implementing fbarrier() on ZFS

2007-02-12 Thread Jeff Bonwick
Do you agree that their is a major tradeoff of builds up a wad of transactions in memory? I don't think so. We trigger a transaction group commit when we have lots of dirty data, or 5 seconds elapse, whichever comes first. In other words, we don't let updates get stale. Jeff

Re: [zfs-discuss] Implementing fbarrier() on ZFS

2007-02-12 Thread Jeff Bonwick
That is interesting. Could this account for disproportionate kernel CPU usage for applications that perform I/O one byte at a time, as compared to other filesystems? (Nevermind that the application shouldn't do that to begin with.) No, this is entirely a matter of CPU efficiency in the current

Re: [zfs-discuss] Re: ZFS vs NFS vs array caches, revisited

2007-02-11 Thread Jeff Bonwick
[b]How the ZFS striped on 7 slices of FC-SATA LUN via NFS worked [u]146 times faster[/u] than the ZFS on 1 slice of the same LUN via NFS???[/b] Without knowing more I can only guess, but most likely it's a simple matter of working set. Suppose the benchmark in question has a 4G working set,

Re: [zfs-discuss] zfs corruption -- odd inum?

2007-02-11 Thread Jeff Bonwick
The object number is in hex. 21e282 hex is 2220674 decimal -- give that a whirl. This is all better now thanks to some recent work by Eric Kustarz: 6410433 'zpool status -v' would be more useful with filenames This was integrated into Nevada build 57. Jeff On Sat, Feb 10, 2007 at 05:18:05PM

Re: [zfs-discuss] zfs rewrite?

2007-01-26 Thread Jeff Bonwick
On Fri, Jan 26, 2007 at 10:57:19PM -0800, Frank Cusack wrote: On January 27, 2007 12:27:17 AM -0200 Toby Thain [EMAIL PROTECTED] wrote: On 26-Jan-07, at 11:34 PM, Pawel Jakub Dawidek wrote: 3. I created file system with huge amount of data, where most of the data is read-only. I change my

Re: [zfs-discuss] File Space Allocation

2006-11-04 Thread Jeff Bonwick
Where can I find information on the file allocation methodology used by ZFS? You've inspired me to blog again: http://blogs.sun.com/bonwick/entry/zfs_block_allocation I'll describe the way we manage free space in the next post. Jeff ___

Re: [zfs-discuss] Re: Re: Snapshots impact on performance

2006-10-29 Thread Jeff Bonwick
Nice, this is definitely pointing the finger more definitively. Next time could you try: dtrace -n '[EMAIL PROTECTED](20)] = count()}' -c 'sleep 5' (just send the last 10 or so stack traces) In the mean time I'll talk with our SPA experts and see if I can figure out how to fix

Re: [zfs-discuss] Re: Corrupted LUN in RAIDZ group -- How to repair?

2006-09-10 Thread Jeff Bonwick
It looks like now the scrub has completed. Should I now clear these warnings? Yep. You survived the Unfortunate Event unscathed. You're golden. Jeff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org

Re: [zfs-discuss] Re: system unresponsive after issuing a zpool attach

2006-08-17 Thread Jeff Bonwick
And it started replacement/resilvering... after few minutes system became unavailbale. Reboot only gives me a few minutes, then resilvering make system unresponsible. Is there any workaroud or patch for this problem??? Argh, sorry -- the problem is that we don't do aggressive enough

Re: [zfs-discuss] ZFS performance using slices vs. entire disk?

2006-08-03 Thread Jeff Bonwick
ZFS will try to enable write cache if whole disks is given. Additionally keep in mind that outer region of a disk is much faster. And it's portable. If you use whole disks, you can export the pool from one machine and import it on another. There's no way to export just one slice and leave

Re: [zfs-discuss] ZFS performance using slices vs. entire disk?

2006-08-03 Thread Jeff Bonwick
is zfs any less efficient with just using a portion of a disk versus the entire disk? As others mentioned, if we're given a whole disk (i.e. no slice is specified) then we can safely enable the write cache. One other effect -- probably not huge -- is that the block placement algorithm is most

Re: [zfs-discuss] ZFS performance using slices vs. entire disk?

2006-08-03 Thread Jeff Bonwick
With all of the talk about performance problems due to ZFS doing a sync to force the drives to commit to data being on disk, how much of a benefit is this - especially for NFS? It depends. For some drives it's literally 10x. Also, if I was lucky enough to have a working prestoserv card

Re: [zfs-discuss] sharing a storage array

2006-07-28 Thread Jeff Bonwick
bonus questions: any idea when hot spares will make it to S10? good question :) It'll be in U3, and probably available as patches for U2 as well. The reason for U2 patches is Thumper (x4500), because we want ZFS on Thumper to have hot spares and double-parity RAID-Z from day one. Jeff

Re: [zfs-discuss] persistent errors - which file?

2006-07-27 Thread Jeff Bonwick
I've a non-mirrored zfs file systems which shows the status below. I saw the thread in the archives about working this out but it looks like ZFS messages have changed. How do I find out what file(s) this is? [...] errors: The following persistent errors have been detected:

  1   2   >