Re: [zfs-discuss] Directory is not accessible

2012-11-26 Thread Justin Stringfellow
unlink(1M)?

cheers,
--justin





 From: Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) 
opensolarisisdeadlongliveopensola...@nedharvey.com
To: Sami Tuominen sami.tuomi...@tut.fi;  zfs-discuss@opensolaris.org 
zfs-discuss@opensolaris.org 
Sent: Monday, 26 November 2012, 14:57
Subject: Re: [zfs-discuss] Directory is not accessible
 
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Sami Tuominen
 
 How can one remove a directory containing corrupt files or a corrupt file
 itself? For me rm just gives input/output error.

I was hoping to see somebody come up with an answer for this ... I would expect 
rm to work...

Maybe you have to rm the parent of the thing you're trying to rm?  But I kinda 
doubt it.

Maybe you need to verify you're rm'ing the right thing?  I believe, if you 
scrub the pool, it should tell you the name of the corrupt things.

Or maybe you're not experiencing a simple cksum mismatch, maybe you're 
experiencing a legitimate IO error.  The rm solution could only possibly work 
to clear up a cksum mismatch.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ok for single disk dev box?

2012-08-30 Thread Justin Stringfellow


 has only one drive. If ZFS detects something bad it might kernel panic and 
 lose the whole system right? 
What do you mean by lose the whole system? A panic is not a bad thing, and 
also does not imply that the machine will not reboot successfully. It certainly 
doesn't guarantee your OS will be trashed.

 I realize UFS /might/ be ignorant of any corruption but it might be more 
 usable and go happily on it's way without noticing? 

UFS has a mount option onerror which defines what the OS will do if there is 
a problem detected with a given filesystem. I think the default is panic 
anyway. Check mount_ufs manpage for details.

Your answer is to take regular backups, rather than bury your head in the sand.

cheers,
--justin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ok for single disk dev box? D1B1A95FBD cf7341ac8eb0a97fccc477127fd...@sn2prd0410mb372.namprd04.prod.outlook.com

2012-08-30 Thread Justin Stringfellow


 would be very annoying if ZFS barfed on a technicality and I had to reinstall 
 the whole OS because of a kernel panic and an unbootable system.

Is this a known scenario with ZFS then? I can't recall hearing of this 
happening.
I've seen plenty of UFS filesystems dieing with panic: freeing free and then 
the ensuing fsck-athon convinces the user to just rebuild the fs in question.

cheers,
--justin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] number of blocks changes

2012-08-06 Thread Justin Stringfellow


Can you check whether this happens from /dev/urandom as well?

It does:

finsdb137@root dd if=/dev/urandom of=oub bs=128k count=1  while true
 do
 ls -s oub
 sleep 1
 done
0+1 records in
0+1 records out
   1 oub
   1 oub
   1 oub
   1 oub
   1 oub
   4 oub
   4 oub
   4 oub
   4 oub
   4 oub

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] number of blocks changes

2012-08-06 Thread Justin Stringfellow


 I think for the cleanness of the experiment, you should also include
sync after the dd's, to actually commit your file to the pool.

OK that 'fixes' it:

finsdb137@root dd if=/dev/random of=ob bs=128k count=1  sync  while true
 do
 ls -s ob
 sleep 1
 done
0+1 records in
0+1 records out
   4 ob
   4 ob
   4 ob
.. etc.

I guess I knew this had something to do with stuff being flushed to disk, I 
don't know why I didn't think of it myself.

 What is the pool's redundancy setting?
copies=1. Full zfs get below, but in short, it's a basic mirrored root with 
default settings. Hmm, maybe I should mirror root with copies=2.    ;)

 I am not sure what ls -s actually accounts for file's FS-block
usage, but I wonder if it might include metadata (relevant pieces of
the block pointer tree individual to the file). Also check if the
disk usage reported by du -k ob varies similarly, for the fun of it?

Yes, it varies too.

finsdb137@root dd if=/dev/random of=ob bs=128k count=1  while true
 do
 ls -s ob
 du -k ob
 sleep 1
 done
0+1 records in
0+1 records out
   1 ob
0   ob
   1 ob
0   ob
   1 ob
0   ob
   1 ob
0   ob
   4 ob
2   ob
   4 ob
2   ob
   4 ob
2   ob
   4 ob
2   ob
   4 ob
2   ob






finsdb137@root zfs get all rpool/ROOT/s10s_u9wos_14a
NAME   PROPERTY  VALUE  SOURCE
rpool/ROOT/s10s_u9wos_14a  type  filesystem -
rpool/ROOT/s10s_u9wos_14a  creation  Tue Mar  1 15:09 2011  -
rpool/ROOT/s10s_u9wos_14a  used  20.6G  -
rpool/ROOT/s10s_u9wos_14a  available 37.0G  -
rpool/ROOT/s10s_u9wos_14a  referenced    20.6G  -
rpool/ROOT/s10s_u9wos_14a  compressratio 1.00x  -
rpool/ROOT/s10s_u9wos_14a  mounted   yes    -
rpool/ROOT/s10s_u9wos_14a  quota none   default
rpool/ROOT/s10s_u9wos_14a  reservation   none   default
rpool/ROOT/s10s_u9wos_14a  recordsize    128K   default
rpool/ROOT/s10s_u9wos_14a  mountpoint    /  local
rpool/ROOT/s10s_u9wos_14a  sharenfs  off    default
rpool/ROOT/s10s_u9wos_14a  checksum  on default
rpool/ROOT/s10s_u9wos_14a  compression   off    default
rpool/ROOT/s10s_u9wos_14a  atime on default
rpool/ROOT/s10s_u9wos_14a  devices   on default
rpool/ROOT/s10s_u9wos_14a  exec  on default
rpool/ROOT/s10s_u9wos_14a  setuid    on default
rpool/ROOT/s10s_u9wos_14a  readonly  off    default
rpool/ROOT/s10s_u9wos_14a  zoned off    default
rpool/ROOT/s10s_u9wos_14a  snapdir   hidden default
rpool/ROOT/s10s_u9wos_14a  aclmode   groupmask  default
rpool/ROOT/s10s_u9wos_14a  aclinherit    restricted default
rpool/ROOT/s10s_u9wos_14a  canmount  noauto local
rpool/ROOT/s10s_u9wos_14a  shareiscsi    off    default
rpool/ROOT/s10s_u9wos_14a  xattr on default
rpool/ROOT/s10s_u9wos_14a  copies    1  default
rpool/ROOT/s10s_u9wos_14a  version   3  -
rpool/ROOT/s10s_u9wos_14a  utf8only  off    -
rpool/ROOT/s10s_u9wos_14a  normalization none   -
rpool/ROOT/s10s_u9wos_14a  casesensitivity   sensitive  -
rpool/ROOT/s10s_u9wos_14a  vscan off    default
rpool/ROOT/s10s_u9wos_14a  nbmand    off    default
rpool/ROOT/s10s_u9wos_14a  sharesmb  off    default
rpool/ROOT/s10s_u9wos_14a  refquota  none   default
rpool/ROOT/s10s_u9wos_14a  refreservation    none   default
rpool/ROOT/s10s_u9wos_14a  primarycache  all    default
rpool/ROOT/s10s_u9wos_14a  secondarycache    all    default
rpool/ROOT/s10s_u9wos_14a  usedbysnapshots   0  -
rpool/ROOT/s10s_u9wos_14a  usedbydataset 20.6G  -
rpool/ROOT/s10s_u9wos_14a  usedbychildren    0  -
rpool/ROOT/s10s_u9wos_14a  usedbyrefreservation  0  -
rpool/ROOT/s10s_u9wos_14a  logbias   latency    default
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] number of blocks changes

2012-08-03 Thread Justin Stringfellow
While this isn't causing me any problems, I'm curious as to why this is 
happening...:



$ dd if=/dev/random of=ob bs=128k count=1  while true
 do
 ls -s ob
 sleep 1
 done
0+1 records in
0+1 records out
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   1 ob
   4 ob    changes here
   4 ob
   4 ob
^C
$ ls -l ob
-rw-r--r--   1 justin staff   1040 Aug  3 09:28 ob

I was expecting the '1', since this is a zfs with recordsize=128k. Not sure I 
understand the '4', or why it happens ~30s later.  Can anyone distribute clue 
in my direction?


s10u10, running 144488-06 KU. zfs is v4, pool is v22.


cheers,
--justin



-bash-3.00$
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] New fast hash algorithm - is it needed?

2012-07-11 Thread Justin Stringfellow
You do realize that the age of the universe is only on the order of
around 10^18 seconds, do you? Even if you had a trillion CPUs each
chugging along at 3.0 GHz for all this time, the number of processor
cycles you will have executed cumulatively is only on the order 10^40,
still 37 orders of magnitude lower than the chance for a random hash
collision.

Here we go, boiling the oceans again :)

Suppose you find a weakness in a specific hash algorithm; you use this
to create hash collisions and now imagined you store the hash collisions 
in a zfs dataset with dedup enabled using the same hash algorithm.

Sorry, but isn't this what dedup=verify solves? I don't see the problem here. 
Maybe all that's needed is a comment in the manpage saying hash algorithms 
aren't perfect.
 
cheers,
--justin___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] New fast hash algorithm - is it needed?

2012-07-11 Thread Justin Stringfellow


 The point is that hash functions are many to one and I think the point
 was about that verify wasn't really needed if the hash function is good
 enough.

This is a circular argument really, isn't it? Hash algorithms are never 
perfect, but we're trying to build a perfect one?
 
It seems to me the obvious fix is to use hash to identify candidates for dedup, 
and then do the actual verify and dedup asynchronously. Perhaps a worker thread 
doing this at low priority?
Did anyone consider this?
 
cheers,
--justin___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] New fast hash algorithm - is it needed?

2012-07-11 Thread Justin Stringfellow
 This assumes you have low volumes of deduplicated data. As your dedup
 ratio grows, so does the performance hit from dedup=verify. At, say,
 dedupratio=10.0x, on average, every write results in 10 reads.

Well you can't make an omelette without breaking eggs! Not a very nice one, 
anyway.
 
Yes dedup is expensive but much like using O_SYNC, it's a conscious decision 
here to take a performance hit in order to be sure about our data. Moving the 
actual reads to a async thread as I suggested should improve things.
 
cheers,
--justin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] New fast hash algorithm - is it needed?

2012-07-11 Thread Justin Stringfellow


 Since there is a finite number of bit patterns per block, have you tried to 
 just calculate the SHA-256 or SHA-512 for every possible bit pattern to see 
 if there is ever a collision?  If you found an algorithm that produced no 
 collisions for any possible block bit pattern, wouldn't that be the win?
 
Perhaps I've missed something, but if there was *never* a collision, you'd have 
stumbled across a rather impressive lossless compression algorithm. I'm pretty 
sure there's some Big Mathematical Rules (Shannon?) that mean this cannot be.
 
cheers,
--justin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SPARC SATA, please.

2009-06-24 Thread Justin Stringfellow

Richard Elling wrote:

Miles Nordin wrote:

ave == Andre van Eyssen an...@purplecow.org writes:
et == Erik Trimble erik.trim...@sun.com writes:
ea == Erik Ableson eable...@mac.com writes:
edm == Eric D. Mudama edmud...@bounceswoosh.org writes:



   ave The LSI SAS controllers with SATA ports work nicely with
   ave SPARC.

I think what you mean is ``some LSI SAS controllers work nicely with
SPARC''.  It would help if you tell exactly which one you're using.

I thought the LSI 1068 do not work with SPARC (mfi driver, x86 only).
  


Sun has been using the LSI 1068[E] and its cousin, 1064[E] in
SPARC machines for many years.  In fact, I can't think of a
SPARC machine in the current product line that does not use
either 1068 or 1064 (I'm sure someone will correct me, though ;-)
-- richard


Might be worth having a look at the T1000 to see what's in there. We used to 
ship those with SATA drives in.

cheers,
--justin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-23 Thread Justin Stringfellow

 with other Word files.  You will thus end up seeking all over the disk 
 to read _most_ Word files.  Which really sucks.  

snip

 very limited, constrained usage. Disk is just so cheap, that you 
 _really_ have to have an enormous amount of dup before the performance 
 penalties of dedup are countered.

Neither of these hold true for SSDs though, do they? Seeks are essentially 
free, and the devices are not cheap.

cheers,
--justin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread Justin Stringfellow

 Raw storage space is cheap.  Managing the data is what is expensive.

Not for my customer. Internal accounting means that the storage team gets paid 
for each allocated GB on a monthly basis. They have 
stacks of IO bandwidth and CPU cycles to spare outside of their daily busy 
period. I can't think of a better spend of their time 
than a scheduled dedup.


 Perhaps deduplication is a response to an issue which should be solved 
 elsewhere?

I don't think you can make this generalisation. For most people, yes, but not 
everyone.


cheers,
--justin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread Justin Stringfellow


 Does anyone know a tool that can look over a dataset and give 
 duplication statistics? I'm not looking for something incredibly 
 efficient but I'd like to know how much it would actually benefit our 

Check out the following blog..:

http://blogs.sun.com/erickustarz/entry/how_dedupalicious_is_your_pool

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can ZFS be event-driven or not?

2008-02-27 Thread Justin Stringfellow

 UFS == Ultimate File System
 ZFS == Zettabyte File System

it's a nit, but..

UFS != Ultimate File System
ZFS != Zettabyte File System


cheers,
--justin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] need some explanation

2007-04-30 Thread Justin Stringfellow



zpool list doesn't reflect pool usage stats instantly. Why?


This is no different to how UFS behaves.

If you rm a file, this uses the system call unlink(2) to do the work which is 
asynchronous.
In other words, unlink(2) almost immediately returns a successful return code to 
rm (which can then exit, and return the user to a shell prompt), while leaving a 
kernel thread running to actually finish off freeing up the used space. Normally 
you don't see this because it happens very quickly, but once in a while you blow 
a 100GB file away which may well have a significant amount of metadata 
associated with it that needs clearing down.


I guess if you wanted to force this to be synchronous you could do something 
like this:


rm /tank/myfs/bigfile  lockfs /tank/myfs

Which would not return until the whole filesystem was flushed back to disk. I 
don't think you can force a flush at a finer granularity than that. Anyone?


regards,
--justin

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs. rmvolmgr

2007-04-10 Thread Justin Stringfellow



Is there a more elegant approach that tells rmvolmgr to leave certain
devices alone on a per disk basis?


I was expecting there to be something in rmmount.conf to allow a specific device 
or pattern to be excluded but there appears to be nothing. Maybe this is an RFE?



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS needs a viable backup mechanism

2006-07-07 Thread Justin Stringfellow

 Why aren't you using amanda or something else that uses
 tar as the means by which you do a backup?

Using something like tar to take a backup forgoes the ability to do things like 
the clever incremental backups that ZFS can achieve though; e.g. only backing 
the few blocks that have changed in a very large file rather than the whole 
file regardless. If 'zfs send' doesn't do something we need to fix it rather 
than avoid it, IMO.

cheers,
--justin

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss