Re: [zfs-discuss] Directory is not accessible
unlink(1M)? cheers, --justin From: Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) opensolarisisdeadlongliveopensola...@nedharvey.com To: Sami Tuominen sami.tuomi...@tut.fi; zfs-discuss@opensolaris.org zfs-discuss@opensolaris.org Sent: Monday, 26 November 2012, 14:57 Subject: Re: [zfs-discuss] Directory is not accessible From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Sami Tuominen How can one remove a directory containing corrupt files or a corrupt file itself? For me rm just gives input/output error. I was hoping to see somebody come up with an answer for this ... I would expect rm to work... Maybe you have to rm the parent of the thing you're trying to rm? But I kinda doubt it. Maybe you need to verify you're rm'ing the right thing? I believe, if you scrub the pool, it should tell you the name of the corrupt things. Or maybe you're not experiencing a simple cksum mismatch, maybe you're experiencing a legitimate IO error. The rm solution could only possibly work to clear up a cksum mismatch. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ok for single disk dev box?
has only one drive. If ZFS detects something bad it might kernel panic and lose the whole system right? What do you mean by lose the whole system? A panic is not a bad thing, and also does not imply that the machine will not reboot successfully. It certainly doesn't guarantee your OS will be trashed. I realize UFS /might/ be ignorant of any corruption but it might be more usable and go happily on it's way without noticing? UFS has a mount option onerror which defines what the OS will do if there is a problem detected with a given filesystem. I think the default is panic anyway. Check mount_ufs manpage for details. Your answer is to take regular backups, rather than bury your head in the sand. cheers, --justin ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ok for single disk dev box? D1B1A95FBD cf7341ac8eb0a97fccc477127fd...@sn2prd0410mb372.namprd04.prod.outlook.com
would be very annoying if ZFS barfed on a technicality and I had to reinstall the whole OS because of a kernel panic and an unbootable system. Is this a known scenario with ZFS then? I can't recall hearing of this happening. I've seen plenty of UFS filesystems dieing with panic: freeing free and then the ensuing fsck-athon convinces the user to just rebuild the fs in question. cheers, --justin ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] number of blocks changes
Can you check whether this happens from /dev/urandom as well? It does: finsdb137@root dd if=/dev/urandom of=oub bs=128k count=1 while true do ls -s oub sleep 1 done 0+1 records in 0+1 records out 1 oub 1 oub 1 oub 1 oub 1 oub 4 oub 4 oub 4 oub 4 oub 4 oub ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] number of blocks changes
I think for the cleanness of the experiment, you should also include sync after the dd's, to actually commit your file to the pool. OK that 'fixes' it: finsdb137@root dd if=/dev/random of=ob bs=128k count=1 sync while true do ls -s ob sleep 1 done 0+1 records in 0+1 records out 4 ob 4 ob 4 ob .. etc. I guess I knew this had something to do with stuff being flushed to disk, I don't know why I didn't think of it myself. What is the pool's redundancy setting? copies=1. Full zfs get below, but in short, it's a basic mirrored root with default settings. Hmm, maybe I should mirror root with copies=2. ;) I am not sure what ls -s actually accounts for file's FS-block usage, but I wonder if it might include metadata (relevant pieces of the block pointer tree individual to the file). Also check if the disk usage reported by du -k ob varies similarly, for the fun of it? Yes, it varies too. finsdb137@root dd if=/dev/random of=ob bs=128k count=1 while true do ls -s ob du -k ob sleep 1 done 0+1 records in 0+1 records out 1 ob 0 ob 1 ob 0 ob 1 ob 0 ob 1 ob 0 ob 4 ob 2 ob 4 ob 2 ob 4 ob 2 ob 4 ob 2 ob 4 ob 2 ob finsdb137@root zfs get all rpool/ROOT/s10s_u9wos_14a NAME PROPERTY VALUE SOURCE rpool/ROOT/s10s_u9wos_14a type filesystem - rpool/ROOT/s10s_u9wos_14a creation Tue Mar 1 15:09 2011 - rpool/ROOT/s10s_u9wos_14a used 20.6G - rpool/ROOT/s10s_u9wos_14a available 37.0G - rpool/ROOT/s10s_u9wos_14a referenced 20.6G - rpool/ROOT/s10s_u9wos_14a compressratio 1.00x - rpool/ROOT/s10s_u9wos_14a mounted yes - rpool/ROOT/s10s_u9wos_14a quota none default rpool/ROOT/s10s_u9wos_14a reservation none default rpool/ROOT/s10s_u9wos_14a recordsize 128K default rpool/ROOT/s10s_u9wos_14a mountpoint / local rpool/ROOT/s10s_u9wos_14a sharenfs off default rpool/ROOT/s10s_u9wos_14a checksum on default rpool/ROOT/s10s_u9wos_14a compression off default rpool/ROOT/s10s_u9wos_14a atime on default rpool/ROOT/s10s_u9wos_14a devices on default rpool/ROOT/s10s_u9wos_14a exec on default rpool/ROOT/s10s_u9wos_14a setuid on default rpool/ROOT/s10s_u9wos_14a readonly off default rpool/ROOT/s10s_u9wos_14a zoned off default rpool/ROOT/s10s_u9wos_14a snapdir hidden default rpool/ROOT/s10s_u9wos_14a aclmode groupmask default rpool/ROOT/s10s_u9wos_14a aclinherit restricted default rpool/ROOT/s10s_u9wos_14a canmount noauto local rpool/ROOT/s10s_u9wos_14a shareiscsi off default rpool/ROOT/s10s_u9wos_14a xattr on default rpool/ROOT/s10s_u9wos_14a copies 1 default rpool/ROOT/s10s_u9wos_14a version 3 - rpool/ROOT/s10s_u9wos_14a utf8only off - rpool/ROOT/s10s_u9wos_14a normalization none - rpool/ROOT/s10s_u9wos_14a casesensitivity sensitive - rpool/ROOT/s10s_u9wos_14a vscan off default rpool/ROOT/s10s_u9wos_14a nbmand off default rpool/ROOT/s10s_u9wos_14a sharesmb off default rpool/ROOT/s10s_u9wos_14a refquota none default rpool/ROOT/s10s_u9wos_14a refreservation none default rpool/ROOT/s10s_u9wos_14a primarycache all default rpool/ROOT/s10s_u9wos_14a secondarycache all default rpool/ROOT/s10s_u9wos_14a usedbysnapshots 0 - rpool/ROOT/s10s_u9wos_14a usedbydataset 20.6G - rpool/ROOT/s10s_u9wos_14a usedbychildren 0 - rpool/ROOT/s10s_u9wos_14a usedbyrefreservation 0 - rpool/ROOT/s10s_u9wos_14a logbias latency default ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] number of blocks changes
While this isn't causing me any problems, I'm curious as to why this is happening...: $ dd if=/dev/random of=ob bs=128k count=1 while true do ls -s ob sleep 1 done 0+1 records in 0+1 records out 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 1 ob 4 ob changes here 4 ob 4 ob ^C $ ls -l ob -rw-r--r-- 1 justin staff 1040 Aug 3 09:28 ob I was expecting the '1', since this is a zfs with recordsize=128k. Not sure I understand the '4', or why it happens ~30s later. Can anyone distribute clue in my direction? s10u10, running 144488-06 KU. zfs is v4, pool is v22. cheers, --justin -bash-3.00$ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New fast hash algorithm - is it needed?
You do realize that the age of the universe is only on the order of around 10^18 seconds, do you? Even if you had a trillion CPUs each chugging along at 3.0 GHz for all this time, the number of processor cycles you will have executed cumulatively is only on the order 10^40, still 37 orders of magnitude lower than the chance for a random hash collision. Here we go, boiling the oceans again :) Suppose you find a weakness in a specific hash algorithm; you use this to create hash collisions and now imagined you store the hash collisions in a zfs dataset with dedup enabled using the same hash algorithm. Sorry, but isn't this what dedup=verify solves? I don't see the problem here. Maybe all that's needed is a comment in the manpage saying hash algorithms aren't perfect. cheers, --justin___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New fast hash algorithm - is it needed?
The point is that hash functions are many to one and I think the point was about that verify wasn't really needed if the hash function is good enough. This is a circular argument really, isn't it? Hash algorithms are never perfect, but we're trying to build a perfect one? It seems to me the obvious fix is to use hash to identify candidates for dedup, and then do the actual verify and dedup asynchronously. Perhaps a worker thread doing this at low priority? Did anyone consider this? cheers, --justin___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New fast hash algorithm - is it needed?
This assumes you have low volumes of deduplicated data. As your dedup ratio grows, so does the performance hit from dedup=verify. At, say, dedupratio=10.0x, on average, every write results in 10 reads. Well you can't make an omelette without breaking eggs! Not a very nice one, anyway. Yes dedup is expensive but much like using O_SYNC, it's a conscious decision here to take a performance hit in order to be sure about our data. Moving the actual reads to a async thread as I suggested should improve things. cheers, --justin ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New fast hash algorithm - is it needed?
Since there is a finite number of bit patterns per block, have you tried to just calculate the SHA-256 or SHA-512 for every possible bit pattern to see if there is ever a collision? If you found an algorithm that produced no collisions for any possible block bit pattern, wouldn't that be the win? Perhaps I've missed something, but if there was *never* a collision, you'd have stumbled across a rather impressive lossless compression algorithm. I'm pretty sure there's some Big Mathematical Rules (Shannon?) that mean this cannot be. cheers, --justin ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SPARC SATA, please.
Richard Elling wrote: Miles Nordin wrote: ave == Andre van Eyssen an...@purplecow.org writes: et == Erik Trimble erik.trim...@sun.com writes: ea == Erik Ableson eable...@mac.com writes: edm == Eric D. Mudama edmud...@bounceswoosh.org writes: ave The LSI SAS controllers with SATA ports work nicely with ave SPARC. I think what you mean is ``some LSI SAS controllers work nicely with SPARC''. It would help if you tell exactly which one you're using. I thought the LSI 1068 do not work with SPARC (mfi driver, x86 only). Sun has been using the LSI 1068[E] and its cousin, 1064[E] in SPARC machines for many years. In fact, I can't think of a SPARC machine in the current product line that does not use either 1068 or 1064 (I'm sure someone will correct me, though ;-) -- richard Might be worth having a look at the T1000 to see what's in there. We used to ship those with SATA drives in. cheers, --justin ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication
with other Word files. You will thus end up seeking all over the disk to read _most_ Word files. Which really sucks. snip very limited, constrained usage. Disk is just so cheap, that you _really_ have to have an enormous amount of dup before the performance penalties of dedup are countered. Neither of these hold true for SSDs though, do they? Seeks are essentially free, and the devices are not cheap. cheers, --justin ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication
Raw storage space is cheap. Managing the data is what is expensive. Not for my customer. Internal accounting means that the storage team gets paid for each allocated GB on a monthly basis. They have stacks of IO bandwidth and CPU cycles to spare outside of their daily busy period. I can't think of a better spend of their time than a scheduled dedup. Perhaps deduplication is a response to an issue which should be solved elsewhere? I don't think you can make this generalisation. For most people, yes, but not everyone. cheers, --justin ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication
Does anyone know a tool that can look over a dataset and give duplication statistics? I'm not looking for something incredibly efficient but I'd like to know how much it would actually benefit our Check out the following blog..: http://blogs.sun.com/erickustarz/entry/how_dedupalicious_is_your_pool ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can ZFS be event-driven or not?
UFS == Ultimate File System ZFS == Zettabyte File System it's a nit, but.. UFS != Ultimate File System ZFS != Zettabyte File System cheers, --justin ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] need some explanation
zpool list doesn't reflect pool usage stats instantly. Why? This is no different to how UFS behaves. If you rm a file, this uses the system call unlink(2) to do the work which is asynchronous. In other words, unlink(2) almost immediately returns a successful return code to rm (which can then exit, and return the user to a shell prompt), while leaving a kernel thread running to actually finish off freeing up the used space. Normally you don't see this because it happens very quickly, but once in a while you blow a 100GB file away which may well have a significant amount of metadata associated with it that needs clearing down. I guess if you wanted to force this to be synchronous you could do something like this: rm /tank/myfs/bigfile lockfs /tank/myfs Which would not return until the whole filesystem was flushed back to disk. I don't think you can force a flush at a finer granularity than that. Anyone? regards, --justin ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS vs. rmvolmgr
Is there a more elegant approach that tells rmvolmgr to leave certain devices alone on a per disk basis? I was expecting there to be something in rmmount.conf to allow a specific device or pattern to be excluded but there appears to be nothing. Maybe this is an RFE? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS needs a viable backup mechanism
Why aren't you using amanda or something else that uses tar as the means by which you do a backup? Using something like tar to take a backup forgoes the ability to do things like the clever incremental backups that ZFS can achieve though; e.g. only backing the few blocks that have changed in a very large file rather than the whole file regardless. If 'zfs send' doesn't do something we need to fix it rather than avoid it, IMO. cheers, --justin ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss