Re: How to mount encrypted file system at boot? Why no pass phrase requesed

2011-04-26 Thread David Kirkby
On 21 April 2011 13:31, Darren J Moffat darr...@opensolaris.org wrote:

 Now I get a problem. I was expecting to have to enter the pass
 phrase  again when attempting to mount the file system, but this is not
 being
 requested. As you can see, I can mount the file system without the pass
 phrase and read the data on the file system.

 I covered that in the talk last night - in fact we had about a 5 minute
 discussion about why it is this way.

 If you want the key to go away you need to run:

        # zfs key -u rpool/export/home/davek

 drkirkby@laptop:~# zfs mount rpool/export/home/davek
 drkirkby@laptop:~# ls /export/home/davek/foo
 /export/home/davek/foo
 drkirkby@laptop:~#

 This looks wrong to me, but I've no idea how to solve it.

 No it is correct by design.

 As I mentioned last night the reason for this is so that delegated
 administration of certain properties can work for users that don't have the
 'key' delegation and don't have access to the wrapping keys.

 For example changing a mountpoint causes an umount followed by a mount.
  There are other changes that under the covers can cause a filesystem to be
 temporarily unmounted and remounted.

Thanks. You did loose me in some places - I guess that was one of them.

 The next issue is how do I get the file system to mount when the

 machine is booted? I want to supply the pass phrase by typing it in,
 rather than from storing it in USB stick or other similar method.

 Since this is your user home directory the ideal way would be a PAM module
 that ran during user login and requested the passphrase for the ZFS
 encrypted home dir.

 There isn't one in Solaris 11 Express (snv_151a) at this time.

 Any  ideas what I need to do to get this file system to request the
 pass phrase before mountin g the file system?

 There is source for a prototype PAM module in the old opensolaris.org
 zfs-crypto repository:

 http://src.opensolaris.org/source/history/zfs-crypto/phase2/usr/src/lib/pam_modules/

 You would need to take a clone of that repository and check out changeset
  6749:6dded109490e  and see if that old PAM module could be hacked into
 submission.  Note that it uses private interfaces and doing so is not
 supported by any Oracle support contract you have.

 --
 Darren J Moffat

I don't fancy going to that length. Are there simpler options, that
don't require hacking the system so much? I guess one way is to store
very little in one's home directory, then store all potentially
sensitive data in another file system. I'm not overly concerned about
what might be in $HOME/.bash_history or similar.



Dave
___
zfs-crypto-discuss mailing list
zfs-crypto-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-crypto-discuss


Re: [zfs-discuss] Spare drives sitting idle in raidz2 with failed drive

2011-04-26 Thread Nikola M.
On 04/26/11 01:56 AM, Lamp Zy wrote:
 Hi,

 One of my drives failed in Raidz2 with two hot spares:
What are zpool/zfs versions? (zpool upgrade Ctrl+c, zfs upgrade Cttr+c).
Latest zpool/zfs versions available by numerical designation in all
OpenSolaris based distributions, are zpool 28 and zfs v. 5. (That is why
one should Not update so S11Ex Zfs/Zpool version if wanting to use/have
installed or continue using in multiple Zfs BE's other open OpenSolaris
based distributions)

What OS are you using with ZFS?
Do you use Solaris 10/update release, Solaris11Express, OpenIndiana
oi_148 dev/ 148b with IllumOS, OpenSolaris 2009.06/snv_134b, Nexenta,
Nexenta Community, Schillix, FreeBSD, Linux zfs-fuse.. (I guess still
not using Linux with Zfs kernel module, but just to mention it
available.. and OSX too).

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How does ZFS dedup space accounting work with quota?

2011-04-26 Thread Ian Collins

 On 04/26/11 04:47 PM, Erik Trimble wrote:

On 4/25/2011 6:23 PM, Ian Collins wrote:

   On 04/26/11 01:13 PM, Fred Liu wrote:

H, it seems dedup is pool-based not filesystem-based.

That's correct. Although it can be turned off and on at the filesystem
level (assuming it is enabled for the pool).

Which is effectively the same as choosing per-filesystem dedup.  Just
the inverse. You turn it on at the pool level, and off at the filesystem
level, which is identical to off at the pool level, on at the
filesystem level that NetApp does.


If it can have fine-grained granularity(like based on fs), that will be great!
It is pity! NetApp is sweet in this aspect.


So what happens to user B's quota if user B stores a ton of data that is
a duplicate of user A's data and then user A deletes the original?

Actually, right now, nothing happens to B's quota. He's always charged
the un-deduped amount for his quota usage, whether or not dedup is
enabled, and regardless of how much of his data is actually deduped.
Which is as it should be, as quotas are about limiting how much a user
is consuming, not how much the backend needs to store that data consumption.


That was the point I was making: quota on deduped usage does not make sense.

I was curious how he proposed doing it the other way!

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] aclmode - no zfs in heterogeneous networks anymore?

2011-04-26 Thread achim...@googlemail.com
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi!

We are setting up a new file server on an OpenIndiana box (oi_148). The
spool is run-in version 28, so the aclmode option is gone. The server
has to serve files to Linux, OSX and windows. Because of the missing
aclmode option, we are getting nuts with the file permissions.

I read a whole lot about the problem and the pros and cons of the
decision of dropping that option in zfs, but I absolutely read nothing
about a solution or work around.

The problem is, that gnome's nautilus as well as OSX' finder perform a
chmod after writing a file over ifs, causing all ACLs to vanish.

If there is no solution, zfs seems to be dead. How do you solve this
problem?

Achim
-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iF4EAREIAAYFAk22jgcACgkQFklBLmozeA5TmgD/claiSnpMkTkcfqVDME/nxBkb
xqLLy4bdFaWOOiybPBQA/3j7sxYOYzvSOMwBJ4+no+vtpOWvZ/C92RPJ4CA7COn4
=dSa3
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] aclmode - no zfs in heterogeneous networks anymore?

2011-04-26 Thread Nikola M.
I am forwarding this to openindiana-disc...@openindiana.org list,
with hope of wider audience  regarding question.

 Original Message 
Message-ID: 4db68e08.9040...@googlemail.com
Date: Tue, 26 Apr 2011 11:19:04 +0200
From: achim...@googlemail.com achim...@googlemail.com
List-Id: zfs-discuss.opensolaris.org

Hi!

We are setting up a new file server on an OpenIndiana box (oi_148). The
spool is run-in version 28, so the aclmode option is gone. The server
has to serve files to Linux, OSX and windows. Because of the missing
aclmode option, we are getting nuts with the file permissions.

I read a whole lot about the problem and the pros and cons of the
decision of dropping that option in zfs, but I absolutely read nothing
about a solution or work around.

The problem is, that gnome's nautilus as well as OSX' finder perform a
chmod after writing a file over ifs, causing all ACLs to vanish.

If there is no solution, zfs seems to be dead. How do you solve this
problem?

Achim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] aclmode - no zfs in heterogeneous networks anymore?

2011-04-26 Thread Frank Lahm
2011/4/26 achim...@googlemail.com achim...@googlemail.com:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Hi!

 We are setting up a new file server on an OpenIndiana box (oi_148). The
 spool is run-in version 28, so the aclmode option is gone. The server
 has to serve files to Linux, OSX and windows. Because of the missing
 aclmode option, we are getting nuts with the file permissions.

 I read a whole lot about the problem and the pros and cons of the
 decision of dropping that option in zfs, but I absolutely read nothing
 about a solution or work around.

 The problem is, that gnome's nautilus as well as OSX' finder perform a
 chmod after writing a file over ifs, causing all ACLs to vanish.

 If there is no solution, zfs seems to be dead. How do you solve this
 problem?

Using Netatalk for giving Macs native AFP support. Latest Netatalk has
a workaround (basically a chmod(3) wrapper) builtin.

Best!
-f
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How does ZFS dedup space accounting work with quota?

2011-04-26 Thread Fred Liu


 -Original Message-
 From: Erik Trimble [mailto:erik.trim...@oracle.com]
 Sent: 星期二, 四月 26, 2011 12:47
 To: Ian Collins
 Cc: Fred Liu; ZFS discuss
 Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work
 with quota?
 
 On 4/25/2011 6:23 PM, Ian Collins wrote:
On 04/26/11 01:13 PM, Fred Liu wrote:
  H, it seems dedup is pool-based not filesystem-based.
  That's correct. Although it can be turned off and on at the
 filesystem
  level (assuming it is enabled for the pool).
 Which is effectively the same as choosing per-filesystem dedup.  Just
 the inverse. You turn it on at the pool level, and off at the
 filesystem
 level, which is identical to off at the pool level, on at the
 filesystem level that NetApp does.

My original though is just enabling dedup on one file system to check if it
is mature enough or not in the production env. And I have only one pool.
If dedup is filesytem-based, the effect of dedup will be just throttled within
one file system and won't propagate to the whole pool. Just disabling dedup 
cannot get rid of all the effects(such as the possible performance degrade ... 
etc),
because the already dedup'd data is still there and DDT is still there. The 
thinkable
thorough way is totally removing all the dedup'd data. But is it the real 
thorough way?

And also the dedup space saving is kind of indirect. 
We cannot directly get the space saving in the file system where the 
dedup is actually enabled for it is pool-based. Even in pool perspective,
it is still sort of indirect and obscure from my opinion, the real space saving
is the abs delta between the output of 'zpool list' and the sum of 'du' on all 
the folders in the pool
(or 'df' on the mount point folder, not sure if the percentage like 123% will 
occur or not... grinning ^:^ ).

But in NetApp, we can use 'df -s' to directly and easily get the space saving.

 
  If it can have fine-grained granularity(like based on fs), that will
 be great!
  It is pity! NetApp is sweet in this aspect.
 
  So what happens to user B's quota if user B stores a ton of data that
 is
  a duplicate of user A's data and then user A deletes the original?
 Actually, right now, nothing happens to B's quota. He's always charged
 the un-deduped amount for his quota usage, whether or not dedup is
 enabled, and regardless of how much of his data is actually deduped.
 Which is as it should be, as quotas are about limiting how much a user
 is consuming, not how much the backend needs to store that data
 consumption.
 
 e.g.
 
 A, B, C,  D all have 100Mb of data in the pool, with dedup on.
 
 20MB of storage has a dedup-factor of 3:1 (common to A, B,  C)
 50MB of storage has a dedup factor of 2:1 (common to A  B )
 
 Thus, the amount of unique data would be:
 
 A: 100 - 20 - 50 = 30MB
 B: 100 - 20 - 50 = 30MB
 C: 100 - 20 = 80MB
 D: 100MB
 
 Summing it all up, you would have an actual storage consumption of  70
 (50+20 deduped) + 30+30+80+100 (unique data) = 310MB to actual storage,
 for 400MB of apparent storage (i.e. dedup ratio of 1.29:1 )
 
 A, B, C,  D would each still have a quota usage of 100MB.


It is true, quota is in charge of logical data not physical data.
Let's assume an interesting scenario -- say the pool is 100% full in logical 
data
(such as 'df' tells you 100% used) but not full in physical data(such as 'zpool 
list' tells
you still some space available), can we continue writing data into this pool?

Anybody has interests to do this experiment? ;-)

Thanks.

Fred
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] arcstat updates

2011-04-26 Thread Volker A. Brandt
Hello Richard!


 I've been working on merging the Joyent arcstat enhancements with some of my
 own and am now to the point where it is time to broaden the requirements
 gathering. The result is to be merged into the illumos tree.

Great news!

 1. Should there be flag compatibility with vmstat, iostat, mpstat, and
 friends?

Don't bother.  I find that I need to look at the man page anyway
if I want to do anything that goes beyond -i 1. :-)

 2. What is missing?

Nothing obvious to me.

 3. Is it ok if the man page explains the meanings of each field, even though 
 it
 might be many pages long?

Yes please!!

 4. Is there a common subset of columns that are regularly used that would 
 justify
 a shortcut option? Or do we even need shortcuts? (eg -x)

No.  Anything I need more than 1-2 times I wil turn into a shell
alias anyway (alias zlist zfs list -tall -o mounted,mountpoint,name :-).

 5. Who wants to help with this little project?

My first reaction was ENOTIME. :-(  What kind of help do you need?


Regards -- Volker
-- 

Volker A. Brandt   Consulting and Support for Oracle Solaris
Brandt  Brandt Computer GmbH   WWW: http://www.bb-c.de/
Am Wiesenpfad 6, 53340 Meckenheim Email: v...@bb-c.de
Handelsregister: Amtsgericht Bonn, HRB 10513  Schuhgröße: 46
Geschäftsführer: Rainer J. H. Brandt und Volker A. Brandt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Spare drives sitting idle in raidz2 with failed drive

2011-04-26 Thread Cindy Swearingen

Hi--

I don't know why the spare isn't kicking in automatically, it should.

A documented workaround is to outright replace the failed disk with one
of the spares, like this:

# zpool replace fwgpool0 c4t5000C5001128FE4Dd0 c4t5000C50014D70072d0

The autoreplace pool property has nothing to do with automatic spare
replacement. When this property is enabled, a replacement disk will
be automatically labeled and replaced. No need to manually run the
zpool command when this property is enabled.

Then, you can find the original failed c4t5000C5001128FE4Dd0 disk
and physically replace it when you have time. You could then add this
disk back into the pool as the new spare, like this:

# zpool add fwgpool0 spare c4t5000C5001128FE4Dd0


Thanks,

Cindy
On 04/25/11 17:56, Lamp Zy wrote:

Hi,

One of my drives failed in Raidz2 with two hot spares:

# zpool status
  pool: fwgpool0
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas 
exist for

the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: resilver completed after 0h0m with 0 errors on Mon Apr 25 
14:45:44 2011

config:

NAME   STATE READ WRITE CKSUM
fwgpool0   DEGRADED 0 0 0
  raidz2   DEGRADED 0 0 0
c4t5000C500108B406Ad0  ONLINE   0 0 0
c4t5000C50010F436E2d0  ONLINE   0 0 0
c4t5000C50011215B6Ed0  ONLINE   0 0 0
c4t5000C50011234715d0  ONLINE   0 0 0
c4t5000C50011252B4Ad0  ONLINE   0 0 0
c4t5000C500112749EDd0  ONLINE   0 0 0
c4t5000C5001128FE4Dd0  UNAVAIL  0 0 0  cannot open
c4t5000C500112C4959d0  ONLINE   0 0 0
c4t5000C50011318199d0  ONLINE   0 0 0
c4t5000C500113C0E9Dd0  ONLINE   0 0 0
c4t5000C500113D0229d0  ONLINE   0 0 0
c4t5000C500113E97B8d0  ONLINE   0 0 0
c4t5000C50014D065A9d0  ONLINE   0 0 0
c4t5000C50014D0B3B9d0  ONLINE   0 0 0
c4t5000C50014D55DEFd0  ONLINE   0 0 0
c4t5000C50014D642B7d0  ONLINE   0 0 0
c4t5000C50014D64521d0  ONLINE   0 0 0
c4t5000C50014D69C14d0  ONLINE   0 0 0
c4t5000C50014D6B2CFd0  ONLINE   0 0 0
c4t5000C50014D6C6D7d0  ONLINE   0 0 0
c4t5000C50014D6D486d0  ONLINE   0 0 0
c4t5000C50014D6D77Fd0  ONLINE   0 0 0
spares
  c4t5000C50014D70072d0AVAIL
  c4t5000C50014D7058Dd0AVAIL

errors: No known data errors


I'd expect the spare drives to auto-replace the failed one but this is 
not happening.


What am I missing?

I really would like to get the pool back in a healthy state using the 
spare drives before trying to identify which one is the failed drive in 
the storage array and trying to replace it. How do I do this?


Thanks for any hints.

--
Peter
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How does ZFS dedup space accounting work with quota?

2011-04-26 Thread Erik Trimble
On 4/26/2011 3:59 AM, Fred Liu wrote:

 -Original Message-
 From: Erik Trimble [mailto:erik.trim...@oracle.com]
 Sent: 星期二, 四月 26, 2011 12:47
 To: Ian Collins
 Cc: Fred Liu; ZFS discuss
 Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work
 with quota?

 On 4/25/2011 6:23 PM, Ian Collins wrote:
   On 04/26/11 01:13 PM, Fred Liu wrote:
 H, it seems dedup is pool-based not filesystem-based.
 That's correct. Although it can be turned off and on at the
 filesystem
 level (assuming it is enabled for the pool).
 Which is effectively the same as choosing per-filesystem dedup.  Just
 the inverse. You turn it on at the pool level, and off at the
 filesystem
 level, which is identical to off at the pool level, on at the
 filesystem level that NetApp does.
 My original though is just enabling dedup on one file system to check if it
 is mature enough or not in the production env. And I have only one pool.
 If dedup is filesytem-based, the effect of dedup will be just throttled within
 one file system and won't propagate to the whole pool. Just disabling dedup 
 cannot get rid of all the effects(such as the possible performance degrade 
 ... etc),
 because the already dedup'd data is still there and DDT is still there. The 
 thinkable
 thorough way is totally removing all the dedup'd data. But is it the real 
 thorough way?
You can do that now. Enable Dedup at the pool level. Turn it OFF on all
the existing filesystems. Make a new test filesystem, and run your tests.

Remember, only data written AFTER the dedup value it turned on will be
de-duped. Existing data will NOT. And, though dedup is enabled at the
pool level, it will only consider data written into filesystems that
have the dedup value as ON.

Thus, in your case, writing to the single filesystem with dedup on will
NOT have ZFS check for duplicates from the other filesystems. It will
check only inside itself, as it's the only filesystem with dedup enabled.

If the experiment fails, you can safely destroy your test dedup
filesystem, then unset dedup at the pool level, and you're fine.


 And also the dedup space saving is kind of indirect. 
 We cannot directly get the space saving in the file system where the 
 dedup is actually enabled for it is pool-based. Even in pool perspective,
 it is still sort of indirect and obscure from my opinion, the real space 
 saving
 is the abs delta between the output of 'zpool list' and the sum of 'du' on 
 all the folders in the pool
 (or 'df' on the mount point folder, not sure if the percentage like 123% will 
 occur or not... grinning ^:^ ).

 But in NetApp, we can use 'df -s' to directly and easily get the space saving.
That is true. Honestly, however, it would be hard to do this on a
per-filesystem basis. ZFS allows for the creation of an arbitrary number
of filesystems in a pool, far higher than NetApp does. The result is
that the filesystem concept is much more flexible in ZFS. The downside
is that keeping dedup statistics for a given arbitrary set of data is
logistically difficult.

An analogy with NetApp is thus: Can you use any tool to find the dedup
ratio of an arbitrary directory tree INSIDE a NetApp filesystem?


 It is true, quota is in charge of logical data not physical data.
 Let's assume an interesting scenario -- say the pool is 100% full in logical 
 data
 (such as 'df' tells you 100% used) but not full in physical data(such as 
 'zpool list' tells
 you still some space available), can we continue writing data into this pool?

Sure, you can keep writing to the volume. What matters to the OS is what
*it* thinks, not what some userland app thinks.


-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How does ZFS dedup space accounting work with quota?

2011-04-26 Thread Fred Liu


-Original Message-
From: Erik Trimble [mailto:erik.trim...@oracle.com] 
Sent: Wednesday, April 27, 2011 12:07 AM
To: Fred Liu
Cc: Ian Collins; ZFS discuss
Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work with quota?

On 4/26/2011 3:59 AM, Fred Liu wrote:

 -Original Message-
 From: Erik Trimble [mailto:erik.trim...@oracle.com]
 Sent: 星期二, 四月 26, 2011 12:47
 To: Ian Collins
 Cc: Fred Liu; ZFS discuss
 Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work
 with quota?

 On 4/25/2011 6:23 PM, Ian Collins wrote:
   On 04/26/11 01:13 PM, Fred Liu wrote:
 H, it seems dedup is pool-based not filesystem-based.
 That's correct. Although it can be turned off and on at the
 filesystem
 level (assuming it is enabled for the pool).
 Which is effectively the same as choosing per-filesystem dedup.  Just
 the inverse. You turn it on at the pool level, and off at the
 filesystem
 level, which is identical to off at the pool level, on at the
 filesystem level that NetApp does.
 My original though is just enabling dedup on one file system to check if it
 is mature enough or not in the production env. And I have only one pool.
 If dedup is filesytem-based, the effect of dedup will be just throttled within
 one file system and won't propagate to the whole pool. Just disabling dedup 
 cannot get rid of all the effects(such as the possible performance degrade 
 ... etc),
 because the already dedup'd data is still there and DDT is still there. The 
 thinkable
 thorough way is totally removing all the dedup'd data. But is it the real 
 thorough way?
You can do that now. Enable Dedup at the pool level. Turn it OFF on all
the existing filesystems. Make a new test filesystem, and run your tests.

Remember, only data written AFTER the dedup value it turned on will be
de-duped. Existing data will NOT. And, though dedup is enabled at the
pool level, it will only consider data written into filesystems that
have the dedup value as ON.

Thus, in your case, writing to the single filesystem with dedup on will
NOT have ZFS check for duplicates from the other filesystems. It will
check only inside itself, as it's the only filesystem with dedup enabled.

If the experiment fails, you can safely destroy your test dedup
filesystem, then unset dedup at the pool level, and you're fine.


Thanks. I will have a try.


 And also the dedup space saving is kind of indirect. 
 We cannot directly get the space saving in the file system where the 
 dedup is actually enabled for it is pool-based. Even in pool perspective,
 it is still sort of indirect and obscure from my opinion, the real space 
 saving
 is the abs delta between the output of 'zpool list' and the sum of 'du' on 
 all the folders in the pool
 (or 'df' on the mount point folder, not sure if the percentage like 123% will 
 occur or not... grinning ^:^ ).

 But in NetApp, we can use 'df -s' to directly and easily get the space saving.
That is true. Honestly, however, it would be hard to do this on a
per-filesystem basis. ZFS allows for the creation of an arbitrary number
of filesystems in a pool, far higher than NetApp does. The result is
that the filesystem concept is much more flexible in ZFS. The downside
is that keeping dedup statistics for a given arbitrary set of data is
logistically difficult.

An analogy with NetApp is thus: Can you use any tool to find the dedup
ratio of an arbitrary directory tree INSIDE a NetApp filesystem?

That is true. There is no apple-to-apple corresponding terminology in NetApp 
for file system in ZFS.
If we think 'volume' in NetApp is the opponent for 'file system' in ZFS, then 
that is doable, because
dedup in NetApp is volume-based.

 It is true, quota is in charge of logical data not physical data.
 Let's assume an interesting scenario -- say the pool is 100% full in logical 
 data
 (such as 'df' tells you 100% used) but not full in physical data(such as 
 'zpool list' tells
 you still some space available), can we continue writing data into this pool?

Sure, you can keep writing to the volume. What matters to the OS is what
*it* thinks, not what some userland app thinks.

OK. And then what the output of 'df' will be?

Thanks.

Fred
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How does ZFS dedup space accounting work with quota?

2011-04-26 Thread Erik Trimble
On 4/26/2011 9:29 AM, Fred Liu wrote:
 From: Erik Trimble [mailto:erik.trim...@oracle.com] 
 It is true, quota is in charge of logical data not physical data.
 Let's assume an interesting scenario -- say the pool is 100% full in logical 
 data
 (such as 'df' tells you 100% used) but not full in physical data(such as 
 'zpool list' tells
 you still some space available), can we continue writing data into this pool?

 Sure, you can keep writing to the volume. What matters to the OS is what
 *it* thinks, not what some userland app thinks.

 OK. And then what the output of 'df' will be?

 Thanks.

 Fred
110% full. Or whatever. df will just keep reporting what it sees. Even
if what it *thinks* doesn't make sense to the human reading it.


-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive replacement speed

2011-04-26 Thread Brandon High
The last resilver finished after 50 hours. Ouch.

I'm onto the next device now, which seems to be progressing much, much better.

The current tunings that I'm using right now are:
echo zfs_resilver_delay/W0t0 | mdb -kw
echo zfs_resilver_min_time_ms/W0t2 | pfexec mdb -kw

Things could slow down, but at 13 hours in, the resilver has been
managing ~ 100M/s and is 70% done.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Spare drives sitting idle in raidz2 with failed drive

2011-04-26 Thread Richard Elling

On Apr 26, 2011, at 8:22 AM, Cindy Swearingen wrote:

 Hi--
 
 I don't know why the spare isn't kicking in automatically, it should.

This can happen if the FMA agents aren't working properly.

FYI, in NexentaStor we have added a zfs-monitor FMA agent to check the
health of disks in use for ZFS and notice when they are no longer responding 
to reads.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] arcstat updates

2011-04-26 Thread Peter Tribble
On Mon, Apr 25, 2011 at 11:58 PM, Richard Elling
richard.ell...@nexenta.com wrote:
 Hi ZFSers,
 I've been working on merging the Joyent arcstat enhancements with some of my 
 own
 and am now to the point where it is time to broaden the requirements 
 gathering. The result
 is to be merged into the illumos tree.

 arcstat is a perl script to show the value of ARC kstats as they change over 
 time. This is
 similar to the ideas behind mpstat, iostat, vmstat, and friends.

 The current usage is:

    Usage: arcstat [-hvx] [-f fields] [-o file] [interval [count]]

    Field definitions are as follows:

[Lots of 'em.]

 Some questions for the community:
 1. Should there be flag compatibility with vmstat, iostat, mpstat, and 
 friends?

Beyond interval and count, I'm not sure there's much in the way of commonality.
Perhaps copy -T for timestamping.

 2. What is missing?

 3. Is it ok if the man page explains the meanings of each field, even though 
 it
 might be many pages long?

Definitely. Unless the meaning of each field is documented elsewhere.

 4. Is there a common subset of columns that are regularly used that would 
 justify
 a shortcut option? Or do we even need shortcuts? (eg -x)

If I was a user of such a tool, I wouldn't know where to start. Which
fields ought
I to be looking at? There are a whole bunch of them. What I would expect is a
handful of standard reports (maybe one showing the sizes, one showing the ARC
efficiency, another one for L2ARC).

 5. Who wants to help with this little project?

I'm definitely interested in emulating arcstat in jkstat. OK, I have
an old version,
but it's pretty much out of date and I need to refresh it.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Spare drives sitting idle in raidz2 with failed drive

2011-04-26 Thread Paul Kraus
On Tue, Apr 26, 2011 at 4:59 PM, Richard Elling
richard.ell...@gmail.com wrote:

 On Apr 26, 2011, at 8:22 AM, Cindy Swearingen wrote:

 Hi--

 I don't know why the spare isn't kicking in automatically, it should.

 This can happen if the FMA agents aren't working properly.

 FYI, in NexentaStor we have added a zfs-monitor FMA agent to check the
 health of disks in use for ZFS and notice when they are no longer responding
 to reads.

I just recently (this past week) had a very similar failure. zpool
consisting of two raidz2 vdevs and two hot spare drives. Each raidz2
vdev consists of 10 drives (I know, not the best layout, but the
activity is large sequential writes and reads and we needed the
capacity). We had a drive fail in one of the vdevs and one of the hot
spares automatically went into action (the special spare device within
the vdev came into being and the hot spare drive resilvered). A short
time later a second drive in the same vdev failed. No action by any
hot spare. The system was running Solaris 10U8 with no additional
patches.

I opened a case with Oracle and they told me that the hot spare
*should* have dealt with the second failure. We replaced the first
(hot spared) drive with zpool replace and it resilvered fine. Then we
replaced the second (non hot spared) drive with zpool replace and the
system hung. I suspected the mpt (multipathing) driver for the SATA
drives in the J4400, there have been some huge improvements in that
driver since 10U8. After rebooting the drive appeared replaced and was
resilvering.

Oracle support chocked the hot spare issue up to an FMA problem
but could not duplicate it in the lab. We have since upgraded to 10U9
+ the latest CPU (April 2011) and are hoping both the hot spare issue
and the mpt drive issue are fixed.

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
- Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
- Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
- Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How does ZFS dedup space accounting work with quota?

2011-04-26 Thread Fred Liu


-Original Message-
From: Erik Trimble [mailto:erik.trim...@oracle.com] 
Sent: Wednesday, April 27, 2011 1:06 AM
To: Fred Liu
Cc: Ian Collins; ZFS discuss
Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work with quota?

On 4/26/2011 9:29 AM, Fred Liu wrote:
 From: Erik Trimble [mailto:erik.trim...@oracle.com] 
 It is true, quota is in charge of logical data not physical data.
 Let's assume an interesting scenario -- say the pool is 100% full in logical 
 data
 (such as 'df' tells you 100% used) but not full in physical data(such as 
 'zpool list' tells
 you still some space available), can we continue writing data into this pool?

 Sure, you can keep writing to the volume. What matters to the OS is what
 *it* thinks, not what some userland app thinks.

 OK. And then what the output of 'df' will be?

 Thanks.

 Fred
110% full. Or whatever. df will just keep reporting what it sees. Even
if what it *thinks* doesn't make sense to the human reading it.

Gotcha!

Thanks.

Fred
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)

2011-04-26 Thread Roy Sigurd Karlsbakk
- Original Message -
 On 04/25/11 11:55, Erik Trimble wrote:
  On 4/25/2011 8:20 AM, Edward Ned Harvey wrote:
   And one more comment: Based on what's below, it seems that the DDT
   gets stored on the cache device and also in RAM. Is that correct?
   What
   if you didn't have a cache device? Shouldn't it *always* be in
   ram?
   And doesn't the cache device get wiped every time you reboot? It
   seems
   to me like putting the DDT on the cache device would be harmful...
   Is
   that really how it is?
  Nope. The DDT is stored only in one place: cache device if present,
  /or/ RAM otherwise (technically, ARC, but that's in RAM). If a cache
  device is present, the DDT is stored there, BUT RAM also must store
  a
  basic lookup table for the DDT (yea, I know, a lookup table for a
  lookup table).
 No, that's not true. The DDT is just like any other ZFS metadata and
 can be split over the ARC,
 cache device (L2ARC) and the main pool devices. An infrequently
 referenced DDT block will get
 evicted from the ARC to the L2ARC then evicted from the L2ARC.
and with the default size of a zfs configuration's metadata being (ram size - 
1GB) / 4, without tuning, and with 128kB blocks all over, you'll need some 
5-6GB+ per terabyte stored. -- Vennlige hilsener / Best regards roy -- Roy 
Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ 
-- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det 
er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss