Re: [zfs-discuss] Server upgrade

2012-02-17 Thread Paul B. Henson
On Thu, Feb 16, 2012 at 12:38:07PM -0800, David Dyer-Bennet wrote:

 Thanks.  Given the pricing for commercial Solaris versions, I don't think
 moving to them is likely to ever be important to me.  It looks like OI and
 Nexenta are the viable choices I have to look at.

Another option soon to be available is Illumian:

http://www.illumian.org/

It's roughly the same as OpenIndiana but using Debian packaging rather
than IPS.

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] S10 version question

2011-10-06 Thread Paul B. Henson
On Wed, Oct 05, 2011 at 07:28:02PM -0700, Paul Kraus wrote:
 
 I have been told by Oracle Support (not first line, but someone
 from engineering in response to an escalation) that the code is done
 to put aclmode back in, and that an IDR can probably be cut against
 the 10U10 kernel. They are finding out when it will be integrated.

I just got an update on my ticket indicating they plan to restore
aclmode in U11, and asking if I can just wait for that. I'm going to
push for a U10 patch to restore it and an IDR pending that.

What a waste of effort all around :(, it really shouldn't have been
removed in the first place IMHO.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] S10 version question

2011-10-05 Thread Paul B. Henson
On Thu, Sep 29, 2011 at 07:13:40PM -0700, Paul Kraus wrote:

 Another potential difference ... I have been told by Oracle Support
 (but have not yet confirmed) that just running the latest zfs code
 (Solaris 10U10) will disable the aclmode property, even if you do not
 upgrade the zpool version beyond 22. I expect to test this next week,
 as we _need_ ACLs to work for our data.

I haven't installed U10, but have confirmed that installing the U10
kernel patch removes aclmode :(. Didn't expect that Solaris 11 change to
be backported... I personally have SR 3-4631579271 open requesting that
breakage be fixed, referencing CR #7002239 which is an RFE to restore
aclmode to Solaris 11. If you have a service contract and care (or even if
you don't care but would like to help out those who do ;) ), open a
ticket requesting the fix for CR #7002239 and get some weight behind it
:).

Support's recommended workaround is basically don't use chmod 8-/. As
if you can control all the things that call chmod behind your back, let
alone the NFS exclusive open issue which stomps on your ACL no matter
*what* you do sigh...

What makes this even more annoying is that U10 backports
ngroups_max=1024 support, which I could *really* *really* use...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] aclmode gone in S10u10?

2011-09-13 Thread Paul B. Henson

On 9/13/2011 5:07 AM, Paul Kraus wrote:


Patch-ID# 144500-19 is the kernel update that is the kernel from
10U10


Yep, the guy posting on sunmanagers confirmed that was the patch he
installed which broke aclmode.

Did update 10 sneak out under cover of darkness or what? I didn't see
any announcements or chatter about it, google doesn't find anything, and
the Oracle download site still only shows update 9:

http://www.oracle.com/technetwork/server-storage/solaris/overview/solaris-latest-version-170418.html


Thank you for flagging this, as Oracle support is telling me I have
to update to this release to get zpool 26 which fixes a zfs bug we
are running into, but if it breaks the ACL inheritance we have been
using, then it is non-starter.


Yup. Fortunately, there are no critical bugs (that Oracle is willing to
fix 8-/) that I've been waiting for. I thought I'd be able to keep an
up-to-date S10 install going while I figure out what to do next; guess
not :(.

He already opened a support ticket, they responded:

-
ZFS appears to be the only file system supporting NFSv4 ACLs
that attempts to preserve ACLs during chmod(2) operations.
Unfortunately, this requires the ACL to be modified in ways that are
confusing to customers and the time has come to stop the confusion and 
to just discard the ACL during chmod(2) operations. This implies that 
the ZFS aclmode property will no longer be needed and will be removed 
from ZFS.


This functionality is targetted to be back in Solaris 11 - as per
CR7002239 want ZFS aclmode property back
-

Interesting that there's already a new CR to put it back -- I thought 
that bridge was already burned. It's already back in Illumos. I wonder 
how long it will take to get put back in S10; or I guess it could be one 
of those CR's that never gets resolved. I suppose I'll open my own 
ticket to voice support.



runs the system out of RAM. The snapshot is a partial zfs recv. I am
told this is a known bug (destruction of large snapshots can run the
system out of RAM as the destroy operation commits as one TXG). There
is a fix, but it in the on-disk format, so just zpool upgrading to
version 26 will not fix existing snapshots. We only have 32 GB in
this system and the faulty snapshot *should* be about 2.5 TB.


Hmm, if updating the zpool won't fix the existing snapshot, how is 
support telling you to recover? Is it going to be one of those wipe and 
rebuild resolutions 8-/? Good luck...


I imagine a pool upgraded to version 26 will no longer be compatible 
with other zfs implementations.


--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] aclmode gone in S10u10?

2011-09-13 Thread Paul B. Henson

On 9/13/2011 5:21 AM, Peter Tribble wrote:


Update 10 has been out for about 3 weeks.


Where was any announcement posted? I haven't heard anything about it. As 
far as I can tell, the Oracle site still only has update 9 available for 
download:


http://www.oracle.com/technetwork/server-storage/solaris/downloads/index.html


I just tried on a U9 and U10 box. On the U10 system, I did a
simple 'chmod g+s' on a directory with an ACL, and wham, the
ACL vanished. Same operation on U9, and the ACL is preserved.


Meh, bogus :(. Thanks for the confirmation.


--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] aclmode gone in S10u10?

2011-09-13 Thread Paul B. Henson

On 9/13/2011 12:46 PM, Peter Tribble wrote:


Hm. They updated that a few weeks ago with a new release but you're right,
it's now back to S10U9. Which leaves me with a whole slew of boxes running
a release that doesn't exist.

 Oracle Solaris 10 8/11 s10x_u10wos_17a X86
   Copyright (c) 1983, 2011, Oracle and/or its affiliates. All rights reserved.
  Assembled 13 July 2011


Interesting. For some reason it just doesn't surprise me 8-/.

Maybe it got recalled because it broke aclmode ;).

On a different but related note, I went to try and open a support ticket 
about update 10, and for the life of me I couldn't figure out how to 
tell it the product in question was Solaris. I'm not sure if something 
has changed since the last time I opened a bug (many months ago), but 
today the support site went beyond its usual painful all the way to 
unusable sigh.



--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] aclmode gone in S10u10?

2011-09-13 Thread Paul B. Henson
On Tue, Sep 13, 2011 at 12:41:36PM -0700, Ian Collins wrote:
 Not work on what way?
 
 I have a client who makes extensive (more like excessive!) use of ACLs 
 on ZFS and we don't see any problems.  Other than the ridiculous 
 complexity of some of the ACLs that have grown over time.

From my perspective, the fact that any random process can come along and
do a legacy chmod() and destroy your acl falls pretty clearly into the
not work camp sigh... That and the fact that an NFSv4 exclusive open
destroys an inherited ACL :(. If you can somehow keep things from trying
to poke legacy mode bits and don't use NFS, maybe you're ok.

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] aclmode gone in S10u10?

2011-09-12 Thread Paul B. Henson
I recently saw a message posted to the sunmanagers list complaining
about installing a kernel patch and suddenly having his ACL's disappear
completely whenever a chmod occurred. I replied and asked him to check
if the aclmode attribute was gone, as it sounded like the default
discard that was (questionably) implemented in OpenSolaris/Solaris 11.
He confirmed it was, so it looks like the removal of aclmode was
backported to Solaris 10? I don't know exactly what kernel patch he
installed; it doesn't look like update 10 is out yet.

Can somebody in the know confirm whether or not aclmode is gone in
update 10? I didn't think they'd backport such a feature disabling
change to Solaris 10, seems to not line up with the long term stability
and compatibility that's supposed to be the benefit there...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [illumos-Developer] revisiting aclmode options

2011-08-03 Thread Paul B. Henson

On 8/2/2011 7:07 AM, Gordon Ross wrote:


It seems consistent to me that a discard mode would simply never
present suid/sgid/sticky.  (It discards mode settings.) After all,
the suid/sgid/sticky bits don't have any counterpart in Windows
security descriptors, and Windows ACL use interited $CREATOR_OWNER
ACEs to do the equivalent of the sticky bit.


I see it somewhat differently; the purpose of discard is to prevent
any attempted change of the mode bits via chmod from affecting the ACL.
As you point out, there is no corresponding functionality in NFSv4 ACLs,
so by definition a change of the suid/sgid/sticky part of the mode bits
would not affect the ACL. And not allowing them to be changed would
result in lost functionality -- for example, setting the sgid bit on the
directory so the group owner is inherited on child directories, which is
actually quite valuable for the functionality of the group@ entry.

So I think the implementation of both a discard and deny aclmode
would need to incorporate the ability to modify the parts of the mode
bits that are not related to the ACL.


--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] revisiting aclmode options

2011-07-21 Thread Paul B. Henson

On 7/19/2011 6:37 PM, Daniel Carosone wrote:


If there were an acl permission for set legacy permission bits,
as distinct from write_acl, that could be set to deny at whatever
granularity you needed...


That does sound interesting; but given it would most likely require an 
update to the NFS 4 ACL spec not very probable, particularly in the 
short term...


--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [illumos-Developer] revisiting aclmode options

2011-07-21 Thread Paul B. Henson

On 7/19/2011 7:10 PM, Gordon Ross wrote:


The idea:  A new aclmode setting called discard, meaning that
the users don't care at all about the traditional mode bits.  A
dataset with aclmode=discard would have the chmod system call and NFS
setattr do absolutely nothing to the mode bits.


The caveat to that are the suid/sgid/sticky bits, which have no
corresponding bits in the ACL, and potentially will still need to be
manipulated. The details on that still need to be worked out :).


The mode bits would be derived from the ACL such that the mode
represents the greatest possible access that might be allowed by the
ACL, without any consideration of deny entries or group memberships.


Is this description different than how the mode bits are currently 
derived when a ZFS acl is set on an object?


--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] revisiting aclmode options

2011-07-18 Thread Paul B. Henson
Now that illumos has restored the aclmode option to zfs, I would like to 
revisit the topic of potentially expanding the suite of available modes. 
Some of you no doubt recall a fairly lengthy (and sometimes heated ;) ) 
discussion of this topic on the zfs-discuss mailing list a bit over a 
year ago, looks like a fairly comprehensive thread archive is available at:


http://opensolaris.org/jive/thread.jspa?messageID=463237#463237

The final outcome at the time was decided solely by Sun/Oracle as the 
arbitrator of OpenSolaris, and their decision was to simply remove 
aclmode entirely. The basis for that decision was not necessarily 
technical merit, nor lack of a need for such a feature, but quite simply 
a business case analysis -- they felt it would cost them less to support 
an operating system without that particular tuning knob.


It's obvious that decision didn't agree with the community, as evidenced 
by the re-addition of the option in the open source illumos. I'm hoping 
that the community might also be more willing to consider the technical 
merits of additional flexibility in the option and be more focused on 
providing functionality than on minimizing support costs :).


My basic premise is that there should be some way to effectively treat a 
zfs filesystem as ACL-only; while mode bits will most likely be needed 
for quite some time for backwards compatibility, they should be treated 
as a second-class citizen, reflecting as closely as possible the 
intention of the underlying ACL, but in a read-only fashion, with no way 
to destroy the underlying ACL by manipulating them.


I initially proposed two extensions to aclmode. First, deny -- any 
attempt to execute a chmod that would result in a change to the 
underlying ACL would fail with a permission denied error. Second, 
discard -- any attempt to execute a chmod that would result in a 
change to the underlying ACL, assuming it would otherwise succeed, would 
appear to suceed but not actually change the permissions.


Clearly, these types of modes could cause problems for certain 
scenarios. On the other hand, the existing modes also cause problems for 
certain scenarios. Ideally, an administrator would have the flexibility 
to choose which problems he prefers to deal with :). It would be really 
nice if the aclmode could be specified on a per object level rather than 
a per file system level, but that would be considerably more difficult 
to achieve 8-/.


If illumos would be willing to consider integrating a change like this, 
I would like to discuss the technical details and determine the best 
possible implementation.


Thanks...


--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] best migration path from Solaris 10

2011-03-24 Thread Paul B. Henson

On 3/21/2011 5:44 PM, Garrett D'Amore wrote:


We do have support for running your own code using our API. Its just
 that we can't reasonably be expected to support people who want do
things like... oh, zpool import -f  (note the -f).  Or editing
local configuration files that are also managed by the management
software.


You wouldn't have to worry about the latter in my case, as I'd turn off
the management software and manage the configuration files automatically
along with the rest of my infrastructure ;).

The main additional components we would need to run would be apache with
mod_authz_fsacl, and a separate instance of apache with mod_perl that
provides the API our identity management infrastructure hooks into to
manage zfs file systems. We also replace syslog with syslog-ng, run
openntpd instead of xntpd, and run a variety of management tools such as
tenshi, aide, munin... Another project that's coming up is going to
require the creation of a captive service account for a CMS to sftp
files into user and group web directories.

I dunno that your support guys would be happy with the relatively
extensive changes we would make to the default state of your appliance
:). And I wouldn't particularly want to worry about having to get
approval to make changes when things come up that don't fit into the
out-of-the-box experience. So unfortunately NexentaStor most likely
won't fit our requirements; I kind of prefer a general-purpose operating
system over an appliance anyway.


NCP 4 will have the same fixes that OpenSolaris has.  I'd be
interested to know which bugs are most annoying for this person --
we have a variety of them fixed in NS 3.1, but have not yet
resync'ed NCP 3 (something we will do when 3.1 ships).


He mentioned in passingan NFSv4 OpenOwner lock problem that I'm
unfamiliar with, and a TCP/IP related kernel panic. He's on the list, so
I guess he could pipe in with more details if he chooses.

One thing that's really biting me right now is the interaction between
NFS exclusive open, ACL's, and mode bits. Due to a limitation in the
protocol, the initial open has a mode of 0, and then the intended
creation mode is separately set later with a setattr. So the object
inherits the correct ACL on the open, and then the equivalent of a chmod
is performed destroying it :(. Oracle most likely isn't going to fix it,
it's been a known issue since January 2005 (CR6215088, initially with
UFS ACLs, also breaks ZFS ACLs),
and the ticket I opened about it was closed. Unfortunately they didn't
take the opportunity to fix it in NFSv4, although it looks like it was
addressed in NFSv4.1. It seems like it should be possible to work around
it in NFSv4, if we end up going with an open source distribution
hopefully we can fix it ourselves. Or it would be solved as a side 
effect of aclmode=ignore.



On a general basis, its hard to allocate engineers for ad-hoc
projects like this mostly because I already have more work than I
have engineers to perform the work.

Oh, did I mention, we're hiring? :-)


I wish you the best of luck in hiring sufficient engineers to be able to
offer support for NCP or OpenIndiana :)...


--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] best migration path from Solaris 10

2011-03-21 Thread Paul B. Henson

On 3/18/2011 3:15 PM, Garrett D'Amore wrote:


a) Nexenta Core Platform is a bare-bones OS.  No GUI, in other words
(no X11.)  It might well suit you.


Indeed :), my servers are headless (well, as headless as you can get on
x86 hardware 8-/, they do have an ipmi remote console that still needs
to be used occasionally sigh) and I generally install a minimal set of
packages. We have the X client libraries installed on some of our linux
servers, as our DBA's like to run the gui oracle installer, but I don't
recall ever needing to run X software on our storage servers. One of my
many spats with Oracle technical support (the database side, not the
operating system side) was trying to get them to justify why the
xscreensaver package was listed as a core dependency of running 10g
under RHEL 5 :(. Never did get an answer to that, they just closed the
ticket out from under me...


c) NCP 4 is still 5-6 months away.  We're still developing it.


By the time I do some initial evaluation, then some prototyping, I don't
anticipate migrating anything production wise until at the earliest
Christmas break, so that timing shouldn't be a problem. Any thoughts on
how soon a beta might be available? As it sounds like there will be
significant changes, it might be better to evaluate with a beta of the
new stuff rather than the production version of the older stuff. Plus I
generally tend to break things in unexpected ways ;), so doing that in
the beta cycle might be beneficial.


d) NCP 4 will make much more use of the illumos userland, and only
use Debian when illumos doesn't have an equivalent.


Given both NCP and OpenIndiana will be based off of illumos, and as of
version 4 NCP will be migrating as much as possible of the userland to
solaris as opposed to gnu, other than the differing packaging formats
what do you feel will distinguish NCP from openindiana? NCP is positioned as
a bare-bones server, whereas openindiana is trying to be more general
purpose including desktop use?


e) NCP comes entirely unsupported.  NexentaStor is a commercial
product with real support behind it, though.


Can you treat NexentaStor like a general purpose operating system, not
use the management gui, and configure everything from a shell prompt, or
is it more appliance like and you're locked out from the OS? In other
words, would it be possible (although not necessarily cost-effective) to
pay for NexentaStor for the support but treat it like NCP?

Has your company considered basic support contracts for NCP? I've heard
from at least one other site that might be interested in something like
that. We don't need much in the way of handholding, the majority of our
support calls end up being actual bugs or limitations in solaris. But if
one of our file servers panics, doesn't import a pool when it boots, and
crashes every time you try to import it by hand, it would be nice to
have an engineer available :).

Thanks...


--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] best migration path from Solaris 10

2011-03-21 Thread Paul B. Henson

On 3/18/2011 6:32 PM, David Magda wrote:


Oracle has said that they will distribute updates to approved CDDL
or other open source- licensed code following full releases of our
enterprise Solaris operating system.

http://unixconsole.blogspot.com/2010/08/internal-oracle-memo-leaked-on-solaris.html


Hmm, I dunno that I'd take a quote from a leaked internal memo as gospel 
;). For that matter, even if they flat out publicly announced it I can't 
say I'd trust them to actually follow through...



--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] best migration path from Solaris 10

2011-03-21 Thread Paul B. Henson

On 3/21/2011 2:59 PM, Garrett D'Amore wrote:


I *hate* talking about unreleased product schedules


:).


but I think you can expect a beta with a month or two, perhaps less.
We've already got an alpha that we've handed out in limited
quantities.


Actually, I read about that alpha; one of my coworkers was at SCALE 9x, 
if I'd known at the time I would have had him pick up a CD ;).



Once you dive under the controlled UI (which you can do), you
basically are breaking your support contract.


Meh :(, that rules it out for me; I need to run our own custom stuff to 
integrate it into our identity management platform.



add-on features like HA clustering, the management UI,
auto-tiering/auto-sync, etc.


HA clustering I would actually be interested in, depending on pricing; 
but unfortunately not in an appliance-only availability.



There have been some discussions, but figuring out how to make that
commercially worthwhile is challenging


Agreed. If not support contracts, what about engineering services 
available on a time/materials basis? That would cover my main concern of 
having expertise available in case of a critical failure. There might 
also be occasions where a specific bug has already been identified, but 
local resources lack sufficient time or knowledge to efficiently fix it. 
One of the people I've spoken to off-line mentioned a handful of known 
opensolaris bugs he'd really like to see resolved in NCP and would be 
willing to pay somebody to make it happen.


Thanks for the info...


--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] best migration path from Solaris 10

2011-03-18 Thread Paul B. Henson
, and there's 
really no certainty that it ever will, I'm not sure how that's going to 
be accomplished. Particularly in the case of zfs, Oracle has moved 
forward with new pool versions which are not supported. Assuming Oracle 
does not release the corresponding source code, is the intention to 
duplicate the functionality with separately written code? And what will 
happen if for example an innovative new zfs feature is developed on the 
open source side, requiring a new pool version? Oracle seems to own that 
namespace.


My first inclination is to prefer OpenIndiana, which would seem to be 
the most flexible option. I particularly like to be able to fix 
something in the code myself and immediately deploy it and then try to 
get it accepted upstream rather than spending ages trying to get some 
support person to agree it should be done at all, let alone having an 
engineer update proprietary source code to implement it. I've got Sun 
(Oracle) support tickets that have been open for over a year waiting to 
get things fixed that I could have fixed myself.


Nexenta on the other hand has the only product ready for short term 
production deployment (well, ruling out Oracle, which I mostly have), as 
I don't think there is a firm timeline yet on a production ready version 
of OpenIndiana. Of course, given the major changes in NexentaCore 4 
dropping a new deployment of NexentaCore 3 into place right now might 
not be the best idea either, it would probably be better to wait for 4 
to come out.


Sorry, I think I'm starting to ramble :). So, for those people with 
similar deployment needs as myself, with a large number of filesystems, 
a large population, and access via multiple protocols, what are you 
currently running? What do you plan to be running in the short to mid 
term? And why :)?



--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS ACL's broken over NFS

2010-12-06 Thread Paul B. Henson

As is altogether far too common an occurance, we were having a problem
where a file was not inheriting the correct ACL, but rather a horribly
munged one resulting in incorrect permissions and security problems.

It appeared something was chmod'ing the file after creation, but despite
best efforts we simply could not find the culprit. After much
investigation, we determined the ACL was only broken when the open
specified O_EXCL.

Upon submitting this issue to support for resolution, we were informed this
was a known problem, specifically CR#6215088. Due to a deficiency in the
NFS protocol, exclusive opens are split into an open and a setattr,
effectively chmod'ing the file upon creation.

This bug was opened in January *2005* against Solaris 9 and presumably ufs
ACL's. Still broken for ZFS ACL's almost 6 years later. Understandably,
the underlying issue is with the protocol; but still you'd think 6 years
would be enough time to implement a reasonable workaround.

They didn't fix this in the NFS 4 spec (why?), but there's some hope on the
distant horizon, the NFS 4.1 spec introduces the EXCLUSIVE4_1 create which
will allow an exclusive create to be done atomically rather than as two
separate operations. Of course, Solaris would need to support NFS 4.1 (no
timeline available) and all clients of interest would need to do so as well
(again no timeline available), but that's not likely to be of much help
anytime soon.

As far as fixing the issue now? Last word from support was:

Provide me with a detailed justification on why Oracle needs to fix this
current bug. Please include a monetary value on how this impacts your
company.

I guess fixing it because it's *broken* just isn't good enough. I guess
fixing it because it's a *security vulnerability that can result in
restricted files being world readable* just isn't good enough either.

According to our ISO, a breach of confidential student data that triggered
the California notification law would cost us anywhere from half a million
to a million dollars, so I guess I'll start with that number and see what
they say. I doubt if the lawyers would let me, but if that scenario
occurred I'd do my damndest to include This notification brought to you
courtesy of poor Oracle software security in the letter ;)...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] live upgrade with lots of zfs filesystems -- still broken

2010-10-19 Thread Paul B. Henson


A bit over a year ago I posted about a problem I was having with live 
upgrade on a system with lots of file systems mounted:


http://opensolaris.org/jive/thread.jspa?messageID=411137#411137

An official Sun support call was basically just closed with no 
resolution. I was quite fortunate that Jens Elkner had made a workaround 
available which made live upgrade actually usable for my deployment 
(thanks again, Jens!). I would have been pretty screwed without it.


While still not exactly speedy, with the workaround in place live 
upgrade was fairly usable, and we've been using it for installing 
patches and upgrading to update releases with no problems.


Until now; unfortunately, after installing the latest live upgrade 
patches on my existing U8 system in preparation for upgrading to U9, 
live upgrade has become even less usable than when I initially tried it 
without the workaround in place. While creating a new BE was still 
reasonably quick, mounting it took over *six* hours to complete 8-/.


Whereas before the most amount of time expended was taken up by 
mounting/unmounting all the filesystems (resolved by Jens' patch), now 
the majority of the six hours were spent spinning in 
/etc/lib/lu/plugins/lupi_bebasic. I don't know exactly what it was doing 
(as the source code to live upgrade does not appear to be available), 
but for most of the six hours it seems it was comparing strings:


# pstack 1670
1670:   /etc/lib/lu/plugins/lupi_bebasic plugin
 fee05973 strcmp   (8046474, 8046478) + 1c3
 fef6ae45 lu_smlGetTagByName (806920c, 16ef, fefa0f30) + 74
 fef71717 lu_tsfSearchFields (806920c, 0, 3, 2, 1, 88369f4) + 13f
 fef4e2da lu_beoGetFstblFilterSwapAndShared (80513bc, 8046978, 8069234,
806920c) + 1be
 fef4f1f7 lu_beoGetFstblToMountBe (80513bc, 80541e4, 80469c4, 80513fc) +
247
 fef515cf lu_beoMountBeByBeName (80513bc, 8046a24, 805419c, 80513fc, 0, 0)
+ 39c
 0804ba6c  (804ef6c, 1, 8068dd4, 0, 8069ee4, 8069ee4)
 fef5fa2b  (804ef6c, 8046f3c, 8069ee4, 8046ae8)
 fef5f5c3  (804ef6c, 8046f3c, 8069ee4, 8046ae8)
 fef5f397  (804ef6c, 8046f3c, 8069ee4)
 fef5f1c5  (804ef6c, 8046f3c, 8069ee4)
 fef603c6  (804ef6c)
 fef5ec12 lu_pluginProcessLoop (804ef6c) + 42
 0804a028 main (2, 8046fa8, 8046fb4) + 2d3
 08049cba  (2, 80471d8, 80471f9, 0, 8069954, 8069914)

Six hours, fully utilizing a CPU core, comparing strings 8-/.

I considered opening a support ticket, but given the lack of response 
previously, I decided to poke around with it a bit myself first. truss 
of lumount revealed that getmntent was being called to enumerate mount 
points, so initially I tried preloading a shared library to interpose 
the getmntent call and skip all the mount points corresponding to my 
data file systems under /export.


That didn't make any difference. I then moved on to look at the multiple 
calls to the zfs binary made by lumount, which seemed potential sources 
of extraneous data which could cause unnecessary processing. Replacing 
/sbin/zfs with a wrapper script yielded quite unexpected results, as it 
seems there are many links to the zfs binary, which does different magic 
depending on the value of argv[0] 8-/.


The path to the zfs binary is statically defined in 
/etc/lib/lu/liblu.so.1, so in a display of horrid kludginess ;), I 
edited the binary file and replaced all instances of /sbin/zfs with 
/sbin/zfb, and created /sbin/zfb with the content:


---
#! /bin/sh

. /etc/default/lu
LUBIN=${LUBIN:=/usr/lib/lu}
. $LUBIN/lulib

if [ $1 = list ] ; then
/sbin/zfs $@ | /usr/bin/egrep -v -f `lulib_get_fs2ignore`
else
exec /sbin/zfs $@
fi
--

This utilizes the configuration included in Jens' patch to ignore the 
exact same set of file systems ignored by the rest of live upgrade with 
the patch installed.


With this kludge in place lumount took *23 seconds*, three orders of 
magnitude less time.


I tend to tilt at windmills, so I probably will end up opening another 
support ticket. The last time there seemed to be no interest in fixing 
live upgrade so it would actually scale :(, maybe this time I'll have 
better luck.


For those Oracle employees in the audience, if anyone could possibly 
explain exactly what processing lupi_bebasic is doing that results in 
six hours of string comparisons, I'm dying of curiosity :). And if 
anyone wants to jump up and champion the cause of getting live upgrade 
to work in an environment with many file systems, I'd be happy to help; 
it would be nice to have shipped code that works without breaking out 
the hex editor ;).


Thanks...


--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating to an aclmode-less world

2010-10-07 Thread Paul B. Henson
On Tue, 5 Oct 2010, Nicolas Williams wrote:

 Right.  That only happens from NFSv3 clients [that don't instead edit the
 POSIX Draft ACL translated from the ZFS ACL], from non-Windows NFSv4
 clients [that don't instead edit the ACL], and from local applications
 [that don't instead edit the ZFS ACL].

You mean the vast majority of applications in existance ;)? Other than
chmod(1) in Solaris, and nfs4_(get|set)_facl in Linux, can you name off the
top of your head *any* other applications that grok ZFS/NFSv4 ACLs (as
opposed to blindly chmod'ing stuff and breaking your access control
sigh)? (and GUI front ends to chmod/(get_set)_facl don't count :) ).

I'm still waiting for the bug in Solaris chgrp that breaks ACLs to get
fixed; I reported that last year sometime. And *that's* a core component of
the Solaris OS itself; what's the chance of a timely response from a 3rd
party vendor whose application doesn't play nicely with ACLs?

broken record
If only there was some way to keep applications from screwing up your ACLs
with inappropriate uses of chmod...
/broken record


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs proerty aclmode gone in 147?

2010-09-28 Thread Paul B. Henson
On Sat, 25 Sep 2010, [iso-8859-1] Ralph Böhme wrote:

 Darwin ACL model is nice and slick, the new NFSv4 one in 147 is just
 braindead. chmod resulting in ACLs being discarded is a bizarre design
 decision.

Agreed. What's the point of ACLs that disappear? Sun didn't want to fix
acl/chmod interaction, maybe one of the new OpenSolaris forks will do the
right thing...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs proerty aclmode gone in 147?

2010-09-28 Thread Paul B. Henson
On Tue, 28 Sep 2010, Nicolas Williams wrote:

 I've researched this enough (mainly by reading most of the ~240 or so
 relevant zfs-discuss posts and several bug reports)

And I think some fair fraction of those posts were from me, so I'll try not
to start rehashing old discussions ;).

 That only leaves aclmode=discard and some variant of aclmode=groupmask
 that is less confusing.

Or aclmode=deny, which is pretty simple, not very confusing, and basically
the only paradigm that will prevent chmod from breaking your ACL.

 So one might wonder: can one determine user intent from the ACL prior to
 the change and the mode/POSIX ACL being set, and then edit the ZFS ACL
 in a way that approximates the user's intention?

You're assuming the user is intentionally executing the chmod, or even
*aware* of it happening. Probably at least 99% of the chmod calls executed
on a file with a ZFS ACL at my site are the result of non-ACL aware legacy
apps being stupid. In which case the *user* intent to to *leave the damn
ACL alone* :)...

 But much better than that would be if we just move to a ZFS ACL world
 (which, among other things, means we'll need a simple libc API for
 editing ACLs).

Yep. And a good first step towards an ACL world would be providing a way to
keep chmod from destroying ACLs in the current world...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs proerty aclmode gone in 147?

2010-09-28 Thread Paul B. Henson

On 9/28/2010 2:13 PM, Nicolas Williams wrote:

 Or aclmode=deny, which is pretty simple, not very confusing, and 
basically

 the only paradigm that will prevent chmod from breaking your ACL.

 That can potentially render many applications unusable.

Yes. Which is why it obviously wouldn't be the default, and probably 
highly inadvisable for a root pool. But for a data file system whose 
usage model demands pure ACLs, which you want to make sure are never 
discarded, why shouldn't it be available?


 Suppose you make chmod(1) not use chomd(2)

Now you're just being pedantic ;).

 But then what about scripts that use chmod(1)?

The user/administrator who intentionally elected to implement the 
nondefault option clearly cares more about not losing their ACLs than 
about some random scripts failing. It seems the vast majority of 
arguments against implementing this option have been of the form bad 
thing foo might happen, and the user would be confused|angry|sad. 
It's my foot, I don't appreciate people trying to prevent me from 
shooting it, particularly if there happens to be some poisonous animal 
sitting on it that is the actual target :).


The version of samba bundled with Solaris 10 seems to insist on 
chmod'ing stuff. I've tried all of the various options that should 
disable mapping to mode bits, yet still randomly when people copy files 
in over CIFS, ACL's get destroyed by chmod interaction and access 
control is broken. I finally ended up having to preload a shared object 
that overrides chmod and turns it into a nullop. This would have been 
the perfect scenario for aclmode=ignore, I don't care what samba may or 
may not want to do, if the file has an ACL I don't want it mucked with. 
So why exactly shouldn't I have that functionality available for this 
scenario?


 Seems like a lot of work for little gain, and high
 support call generation rate.

Yup. I agree that any attempt to impose sanity on the application of 
chmod onto an object with an ACL by somehow combining the existing ACL 
with the new mode is pointless. It seems that there are three reasonable 
options when attempting to update the permissions of an object with an 
ACL via the legacy chmod call -- just discard the ACL, deny the 
attempted change, or ignore it.


 Yep. And a good first step towards an ACL world would be providing a 
way to

 keep chmod from destroying ACLs in the current world...

 I don't think that will happen...

Not in the official Oracle version of Solaris, but given Oracle's 
choices it seems that's not going to be the only version of the 
OpenSolaris code base floating around. Maybe the Illumos community will 
be a little more open to providing users with functionality that could 
be extremely valuable in some scenarios and doesn't hurt anybody who 
doesn't intentionally choose to use it... I said before I'd be willing 
to implement it myself if there was some reasonable likelihood it would 
be accepted, once the whole Illumos thing settles down I think I'll make 
that offer again and see what happens.



--
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS development moving behind closed doors

2010-08-19 Thread Paul B. Henson
On Wed, 18 Aug 2010, Erik Trimble wrote:

 While there were certainly a few folks who ran OpenSolaris in production
 (who absolutely needed the new features and couldn't wait until they made
 it to Solaris 10),

Or those features that simply were never going to be backported to S10,
particularly the in-kernel CIFS server... We were planning on migrating
from S10 to OpenSolaris, and that was one of the major reasons. If
OpenSolaris 3/2010 had actually been released, we might have even been
there now...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Legality and the future of zfs...

2010-07-14 Thread Paul B. Henson
On Tue, 13 Jul 2010, Edward Ned Harvey wrote:

 It is true there's no new build published in the last 3 months.  But you
 can't use that to assume they're killing the community.

Hmm, the community seems to think they're killing the community:


http://developers.slashdot.org/story/10/07/14/1448209/OpenSolaris-Governing-Board-Closing-Shop?from=rss


ZFS is great. It's pretty much the only reason we're running Solaris. But I
don't have much confidence Oracle Solaris is going to be a product I'm
going to want to run in the future. We barely put our ZFS stuff into
production last year but quite frankly I'm already on the lookout for
something to replace it.

No new version of OpenSolaris (which we were about to start migrating to).
No new update of Solaris 10. *Zero* information about what the hell's going
on...

ZFS will surely live on as the filesystem under the hood in the doubtlessly
forthcoming Oracle database appliances, and I'm sure they'll keep selling
their NAS devices. But for home users? I doubt it. I was about to build a
big storage box at home running OpenSolaris, I froze that project. Oracle
is all about the money. Which I guess is why they're succeeding and Sun
failed to the point of having to sell out to them. My home use wasn't
exactly going to make them a profit, but on the other hand, the philosophy
that led to my not using the product at home is a direct cause of my lack
of desire to continue using it at work, and while we're not exactly a huge
client we've dropped a decent penny or two in Sun's wallet over the years.

Who knows, maybe Oracle will start to play ball before August 16th and the
OpenSolaris Governing Board won't shut themselves down. But I wouldn't hold
my breath.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Legality and the future of zfs...

2010-07-14 Thread Paul B. Henson
On Wed, 14 Jul 2010, Roy Sigurd Karlsbakk wrote:

 Once the code is in the open, it'll remain there. To quote Cory Doctorow
 on this, it's easy release the source of a project, it's like adding ink
 to your swimming pool, but it's a little harder to remove the ink from
 the pool...

Woo-hoo, the code already released won't be taken back ;). But considering
virtually all zfs development has been and presumably will continue to be
by Sun/Oracle employees, that code is going to get stale pretty quick if
they stop contributing to it...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] One dataset per user?

2010-06-24 Thread Paul B. Henson
On Tue, 22 Jun 2010, Arne Jansen wrote:

 We found that the zfs utility is very inefficient as it does a lot of
 unnecessary and costly checks.

Hmm, presumably somebody at Sun doesn't agree with that assessment or you'd
think they'd take them out :).

Mounting/sharing by hand outside of the zfs framework does make a huge
difference. It takes about 45 minutes to mount/share or unshare/unmount
with the mountpoint and sharenfs zfs properties set, mounting/sharing by
hand with SHARE_NOINUSE_CHECK=1 even just sequentially only took about 2
minutes. With some parallelization I could definitely see hitting that 10
seconds you mentioned, which would sure make my patch windows a hell of a
lot shorter. I'll need put together a script and fiddle some with smf, joy
oh joy, I need these filesystems mounted before the web server starts.

Thanks much for the tip!

I'm hoping someday they'll clean up the sharing implementation and make it
a bit more scalable. I had a ticket open once and they pretty much said it
would never happen for Solaris 10, but maybe sometime in the indefinite
future for OpenSolaris...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] One dataset per user?

2010-06-22 Thread Paul B. Henson
On Sun, 20 Jun 2010, Arne Jansen wrote:

 In my experience the boot time mainly depends on the number of datasets,
 not the number of snapshots. 200 datasets is fairly easy (we have 7000,
 but did some boot-time tuning).

What kind of boot tuning are you referring to? We've got about 8k
filesystems on an x4500, it takes about 2 hours for a full boot cycle which
is kind of annoying. The majority of that time is taken up with NFS
sharing, which currently scales very poorly :(.

Thanks...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, NFS, and ACLs ssues

2010-05-03 Thread Paul B. Henson
On Mon, 3 May 2010, Mary Ellen Fitzpatrick wrote:

 I want to use autofs on the remote clients,  as I have many dirs that
 need to be exported from /zp-ext/test/*

 Here is the auto.home on the client, as setup for the user mfitzpat.  I
 really do not want to edit the auto.home file for each user.
 mfitzpat-rw,hard,intr   hecate:/zp-ext/test/mfitzpat

 Is there a way to set permissions so that the /etc/auto.home file on the
 clients does not list every exported dir/mount point?

If all of the directories to be mounted are on the same server in the same
place, you can use a wildcard entry:

* -rw,hard,intr hecate:/zp-ext/test/


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, NFS, and ACLs ssues

2010-04-29 Thread Paul B. Henson
On Thu, 29 Apr 2010, Mary Ellen Fitzpatrick wrote:

 hecate:/zp-ext/test zfs get sharenfs zp-ext/test/mfitzpat
[...]
 hecate:/zp-ext/test chown -R mfitzpat:umass mfitzpat
[...]
 test-rw,hard,intr   hecate:/zp-ext/test
[...]
 drwxr-xr-x+ 2 root root 2 Apr 29 11:15 mfitzpat

Unless I'm missing something, you chown'd the filesystem
zp-ext/test/mfitzpat but you mounted the filesystem zp-ext/test; hence
you're seeing the mount point for the mfitzpat filesystem in the
zp-ext/test filesystem over NFS, not the actual zp-ext/test/mfitzpat
filesystem.

Pending the availability of mirror mounts


(http://hub.opensolaris.org/bin/download/Project+nfs-namespace/files/mm-PRS-open.html)

you need to mount each ZFS filesystem you're exporting via NFS separately.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS aclmode property

2010-03-08 Thread Paul B. Henson
On Sat, 6 Mar 2010, Paul B. Henson wrote:

 If you have a Sun support contract, open a support call and ask to be added
 to SR #72456444, which is the case I have open to try and get a better
 solution to chmod/ACL interaction.

CR#6933018 has been created for this issue; for any interested parties
who'd like to track it...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS aclmode property

2010-03-06 Thread Paul B. Henson
On Sat, 6 Mar 2010, Ralf Utermann wrote:

 we recently started to look at a ZFS based solution as a possible
 replacement for our DCE/DFS based campus filesystem (yes, this is still
 in production here).

Hey, a fellow DFS shop :)... We finally migrated the last production files
off of DFS last month, I'm actually going to pull the plug on the
infrastructure within a couple of weeks. It will be nice not to have to
worry that software that's been unsupported for years will go blooey :(.

 The ACL model of the combination OpenSolaris+ZFS+in-kernel-CIFS+NFSv4
 looks like a really promising setup, something which could place it high
 up on our list ...

Indeed, while we're currently running S10 with samba (our development
started before OpenSolaris support was announced; we're hoping to migrate
sometime this year), Solaris/ZFS was the best option we could find to
replace our DFS infrastructure. The main thing I miss is the location
independence and ability to migrate data between servers while it's in use.
Other than this annoying chmod/ACL issue, our only other major problem is
lack of scalability in NFS sharing, it takes a good 45 minutes to
share/unshare the 8000 filesystems on each of our X4500's (we have 5),
resulting in about a 2 hour reboot cycle :(. There's an open bug on it, but
they say it will never be addressed in Solaris 10, but hopefully someday in
OpenSolaris.

 So from this site: we very much support the idea of adding ignore and
 deny values for the aclmode property!

If you have a Sun support contract, open a support call and ask to be added
to SR #72456444, which is the case I have open to try and get a better
solution to chmod/ACL interaction. If you're thinking of spending a lot of
money on Sun hardware, bring this issue up to your sales guy and push for a
solution. I think part of the problem is very few sites actually use ACLs,
particularly to the extent people coming from a DFS background are used to
:(.

 However, reading PSARC/2010/029, it looks like we will get
 aclmode=discard for everybody and the property removed. I hope this is
 not the end of the story ...

As do I, but so far it's not looking too good. I discussed my proposal with
Mark Shellenbaum (the author of that PSARC case), and he was pretty
strongly against it. I thought I made some rather good points, but as I'm
sure you saw from the threads you referenced there are quite strong
opinions on both sides. He seems to be Sun's main guy when it comes to
ACL's; if he was on board it would be a lot more likely to happen, but I
never heard back from him on my counter response to his initial reply
detailing his reasons he thought it was a bad idea, and he was
conspicuously absent during the recent list free-for-all...

As I've offered before, I'll implement it if they'll merge it...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-03-03 Thread Paul B. Henson
On Wed, 3 Mar 2010, David Dyer-Bennet wrote:

 It's the normal way to do it; not sure where in the Linux world it arose,
 but I first saw it in some early distribution.  It's done automatically by
 adduser.   In my perception, it's best practice.  So the question is,
 why do you NOT want to do it?

It's the historical way to do it. Best practices change over time. As
I've already indicated, I would get no benefits from such a practice, and
it would result in 7 extra unnecessary groups in my environment. It
used to be common practice to leave your smtp servers as open relays, would
you have argued against locking them down because implementing smtp
authentication was too hard for you? It used to be common practice to
access servers via telnet, would you have argued against the deployment of
ssh because you didn't want to learn how to configure it? Your basic
premise in this argument seems to be that the tools to create a pure-ACL
environment shouldn't be made available to anyone because you don't
understand ACL's, they're too hard (for you) to use, and you would have to
change how you do things.

 I can't believe in that model. If I buy it, every time I consider a
 script set or application for use, I have to do extensive testing to
 verify that it works in the model.

And conversely, every time I consider a script or application for my
current deployment, I have to do extensive testing to make sure it doesn't
break my ACL's. It can't always be about you.

 And I have to deal with users having that problem on their own.  This is
 far, far too expensive to give any serious consideration to.

The option I propose would only be configurable by someone who had
privileges to update zfs filesystem properties. If that's the sysadmin,
clearly he wants it that way. If end users are delegated that privilege,
they must be deemed competent enough to shoot their own foot.

 The command-line interface to ACLs is confusing, possibly weak.

Matter of perspective. I don't have much trouble with it, and if I did I'd
write my own interface (as I've done before
http://www.cpan.org/modules/by-authors/id/PHENSON/aclmod-1.1.readme).

 Generally you have to work on things for a while before they get
 widespread adoption, especially by outside implementers.  It's entirely
 possible that NFS V4 will be widely used, but from what I read on linux
 sysadmin forums, it isn't yet.

Somebody has to go first or it's a chicken and egg problem. And it seems
reasonable that the people who do go first will run into issues that need
new best practices to be deployed, no? And then it kind of sucks that
people who aren't even using the technology cry that models shouldn't
change while fearfully grasping their buggy whips ;).

 When a tool is there, people will want to use it.  When using it causes
 me endless trouble, I don't want them to use it.

If it's your server, you choose whether the option is used. If it's not
your server, or the user has been granted permission to manage their own
filesystem, then it seems it's not really up to you, and I'm not sure why
you think you should be able to dictate what other people do with their own
environments.

 We're all arguing for what will work best for us, I think; selfish, but I
 hope in the sanely selfish region.

The difference is I'm arguing for functionality that I need and will be
valuable in my environment. You're arguing that someone else shouldn't be
able to get the functionality they need because you won't be happy if it
exists at all, even if no one forces you to use it. It seems that's an
entirely different grade of selfishness, like a bully who knocks you down
and steals your lunch but just throws it away because he doesn't like
peanut butter :(. It's not about getting something you need but just
keeping someone else from having it.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-03-03 Thread Paul B. Henson
On Wed, 3 Mar 2010, David Dyer-Bennet wrote:

 I don't think it will work as well for you as you think it will; I think
 you'll then find yourself complaining that backup systems don't work, and
 indexing systems don't work, and this doesn't work, and that doesn't work,
 all because you've broken the underlying model.

Thanks for the concern :), but I think I know my potential use cases pretty
well. I don't know why backups would fail, they shouldn't be wandering
around changing permissions. And our backup system supports ZFS ACL's
anyway. Indexing systems? It's not a windows box ;). I doubt it would be
wise to configure this hypothetical option on a root pool, but as far as
I'm concerned, on my user/group data filesystems, this would be *fixing*
the underlying model (pure-ACL), not breaking it.

 And I have a definite fear that it'll end up impacting me, that not
 using it won't be as clear an option as you think it will.

Technology changes; it's a bad field to be in for the change adverse :).


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-03-02 Thread Paul B. Henson
On Tue, 2 Mar 2010, Kjetil Torgrim Homme wrote:

 no.  what happens when an NFS client without ACL support mounts your
 filesystem?  your security is blown wide open.  the filemode should
 reflect the *least* level of access.  if the filemode on its own allows
 more access, then you've lost.

Say what?

If you're using secure NFS, access control is handled on the server side.
If an NFS client that doesn't support ACL's mounts the filesystem, it will
have whatever access the user is supposed to have, the lack of ACL support
on the client is immaterial.

On the other hand, if you're using AUTH_SYS, you don't care about security
in the first place so there's no real point in worrying about it.

 if your ACLs are completely specified and give proper access on their
 own, and you're using aclmode=passthrough, chmod -R 000 / will not
 harm your system.

Actually, it will destroy the three special ACE's, user@, group@, and
every...@. On the other hand, with a hypothetical aclmode=ignore or
aclmode=deny, such a chmod would indeed not harm the system.

 if you have rogue processes doing chmod a+rwx or other nonsense, you
 need to fix the rogue process, that's not an ACL problem or a problem
 with traditional Unix permissions.

What I have are processes that don't know about ACL's. Are they broken? Not
in and of themselves, they are simply incompatible with a security model
they are unaware of. Why on earth would I want to go and try to make every
single application in the world ACL aware/compatible instead of simply
having a filesystem which I can configure to ignore any attempt to
manipulate legacy permissions?

 not at all.  you just have to use them correctly.

I think we're just not on the same page on this; while I am not saying I'm
on the right page, it does seem you need to do a little more reading up on
how ACL's work.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-03-02 Thread Paul B. Henson
On Tue, 2 Mar 2010, David Dyer-Bennet wrote:

 Hmmm; the lack of flexibility you talk about comes from not using the
 security model sensibly -- having per-person groups is very useful in
 that security model.

I have 70 odd thousand users. Why would I want to also have 70 thousand
groups with only one user in each one? From an authorization perspective,
the user and group are identical. The absolute only reason to implement
such a duplicative environment is so you can have one umask, but still be
able to control whether or not someone other than the user gets permissions
on new files. In a world with inheritable ACL's, you don't need to do that.

 You see it as a legacy security model; but for me it's the primary
 security model, with ACLs as an add-on.  It's the only one that's
 supported across the various ways of sharing the disks. In the end,
 Solaris is one player in the POSIX world, and cutting yourself off from
 that would be very limiting.

If the design requirements of your filesystem require backward
interoperability, then yes. On the other hand, if they don't, and you would
be better served with a pure-ACL deployment, why hold yourself down with
the chains of a security model you don't need?

 It's precisely to avoid having shell access being a poor stepchild that
 I'm resisting ACLs.  As currently implemented, they relegate my primary
 access to the system to second-class status.

How so? Do you mean operating in a shell on a system with no ACL support?

 And NFSv4 is mostly a rumor in my universe; NFSv2 and v3 are what people
 actually use.

Really? We've deployed NFSv4 here, and other than this ACL/chmod issue it's
working great. I think I'd rather design my future technology based on the
needs and possibilities of the future, not on the past. From that
perspective, why should Sun bother to work on NFSv4 at all if nobody uses
it.

Again, I'm not advocating removing any current functionality or depriving
you of anything you currently have. I'm simply requesting an additional
feature set that would be very useful for some deployments. I'm not really
sure why people keep arguing about why it would not be good for their
deployment, and considering such a reason it should not be implemented --
that seems a bit self-centered.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-03-02 Thread Paul B. Henson
On Tue, 2 Mar 2010, Bill Sommerfeld wrote:

 While we're designing on the fly: Another possibility would be to use an
 additional umask bit or two to influence the mode-bit - acl interaction.

I've think trying to continue shoving a square page into a round hole is
simply the wrong thing to do; rather than trying to force together
different security models, allow an option getting rid of the security
model not desired, letting the other one just work.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-03-02 Thread Paul B. Henson
On Tue, 2 Mar 2010, Marion Hakanson wrote:

 The answer is, It depends.  If the NetApp volume is NTFS-only
 permissions, then chmod from the Unix/NFS side doesn't work, and you can
 only manipulate permissions from Windows clients..  If it's a mixed
 security-style volume, chmod from the Unix/NFS side will delete the NTFS
 ACL's, and the SMB clients will see faked-up ACL's that match the new
 POSIX permissions.  Whichever side made the most recent change will be in
 effect.

We evaluated NetApp before selecting ZFS, my general summary of their
permissions implementation is messy, confusing, and inconsistent. Even
with these chmod issues, the ZFS implementation is far superior,
particularly for sharing the same ACL's between NFSv4 and CIFS.

 BTW, our experience has been that NFSv4 on NetApp does not work very
 well, and NetApp support folks have advised us to not use it in order to
 avoid crashing the filer.  They of course blame the various incompatible
 NFSv4 client implementations out there

Indeed; other than a few minor issues here and there, NFSv4 with Solaris
servers, and both Linux and Solaris clients has been working great.

And I don't really think a bad client should be able to crash the server;
regardless of the client problems that's a server bug.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-03-02 Thread Paul B. Henson

Yes. Yes. Yes. I agree with every one of your points in this message :).

On Tue, 2 Mar 2010, Nicolas Williams wrote:

 Well, I think the bit, if we must have one, belongs in the filesystem
 objects that have ACLs, as opposed to processes.  There may be no umask
 to apply in remote access cases, so using a process attribute is likely
 to result in different behavior according to the access protocol and
 client.  That might not be surprising for the CIFS case, but it certainly
 would be for the NFS case.

 But also I think it's the owner of an object that should decide what
 happens to the object's ACL on chmod rather than random programs and user
 environments.

 We might need multiple bits, but we do have multiple bits to play with in
 mode_t.  The main issue with adding mode_t bits is going to be: will apps
 handle the appearance of new mode_t bits correctly?  I suspect that they
 will, or at least that we'd condier it a bug if they didn't.  Or we could
 add a new file attribute.

 But given cheap datasets, why not settle for a suitable dataset property
 as a starting point.  I.e., maybe we could play with aclmode a little
 more.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-03-02 Thread Paul B. Henson
 that will necessarily be lossy in many
 cases.

It's fairly easy to downconvert an ACL into mode bits, user@, group@, and
everyone@ come pretty close. Making up a set of mode bits to feed to
non-ACL-aware applications makes perfect sense. My problem is going the
other way, trying to take a set of mode bits and somehow upscale it into an
ACL is broken for my needs. Yes, some people might need that capability,
and I absolutely support that they should have it -- but I'd like to have
the capability to not have it.

  If you only ever access ZFS via CIFS from windows clients, you can have a
  pure ACL model. Why should access via local shell or NFSv4 be a poor
  stepchild and chained down with legacy semantics that make it exceedingly
  difficult to actually use ACL's for their intended purpose?

 I am certainly not advocating that.

Good :). I am certainly not wedded to my proposal, if some other solution
is proposed that would meet my requirements, great. However, pretty much
all of the advice has boiled down to either ACL's are broken, don't use
them, or why would you want to do *that*?, which isn't particularly
useful.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-03-02 Thread Paul B. Henson
On Tue, 2 Mar 2010, Nicolas Williams wrote:

 BTW, it should be relatively easy to implement aclmode=ignore and
 aclmode=deny, if you like.

I looked over the code some, and from an intuitive point of view it didn't
seem like it would be that hard; thanks for the pointers.

I'm absolutely willing to put my code where my mouth is, I'd happily
implement this functionality myself. However, unless I can get it included
upstream, that's not really going to resolve my problem. I don't think it
would be possible to replace the Solaris 10 kernel with my own locally
compiled one, and while I theoretically could for opensolaris, I doubt that
would be considered a supportable configuration. I'd certainly take
responsibility for any bugs with my own code, but I don't really have the
resources to support the entire kernel/OS on my own :(, and I'm pretty sure
my management wants a phone number to call and an entity to point their
finger at ;).

I was kind of hoping to get some buy-in on the concept, but so far no luck.
I might still implement it just to show it can be done, but I don't
currently have an opensolaris dev environment set up, and it seems like an
awful lot of initial investment just to be able to point to something and
say Boy, it sure would be cool if they accepted that into the code base
:(.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-03-02 Thread Paul B. Henson
On Tue, 2 Mar 2010, Fredrich Maney wrote:

 We allow people to choose between filesystems, volume managers, password
 encryption algorithims, profiles, etc. Why not allow them to pick one
 file security model, another, or both?

Choice is good :).

 Now, of course, the devil is in the details of implementation. Do we make
 it system wide (a la a setting in some file in /etc/security) or zpool of
 zfs dataset specific? I would think the most clean way would be to put it
 at the dataset level.

My preference would be to implement as new aclmode options, so it would be
per dataset.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-03-02 Thread Paul B. Henson
On Tue, 2 Mar 2010, Kjetil Torgrim Homme wrote:

 this is true for AUTH_SYS, too, sorry about the bad example.

Technically I suppose the server actually makes the determination about
access, but given it makes it based blindly on whatever the client tells
it, it seems it's really the client with the power.

 doesn't really affect my point.  if you just consider the filemode to be
 the lower bound for access rights, aclmode=passthrough will not give you
 any nasty surprises regardless of what clients do, *and* an ACL-ignorant
 client will get the behaviour it needs and wants.  win-win!

Lose-lose. I don't get to avail of the full potential of expression ACL's
provide, and ACL-ignorant clients will get to screw up the permissions I
specified. Sorry, but on my data, I think the behavior I need and want
should override what some random application wants to do.

 you're not using those, are you?  they are a direct mapping of the old
 style permissions, so it would be pretty weird if they were allowed to
 diverge.

Why wouldn't I use them? They're a defined part of the ACL standard, and
objects continue to have owners and group owners. I have no issue with read
only mode bits being synthesized from the special ACE's and provided to
clients, my issue is having non-ACL aware apps trying to update mode bits
and that being translated into a lossy change to the ACL.

 you made that model.

Thanks for the compliment, but I'm afraid I did not contribute to either
the original implementation of mode bits, nor windows CIFS ACL's, nor the
RFC for NFSv4 ACL's, nor implementation thereof in ZFS.

Surely you're not arguing that mode bits and ACL's are not different
security models?

 you don't have to.  just subscribe to the principle of least security,
 and it just works.

I don't want least security. I want best security. Again, you're pretty
much saying I wouldn't have a problem if I chose not to have a problem. I'm
not quite sure what your point is. If my car engine caught on fire every
time I exceeded 50 mph, would you say that rather than taking it to the
dealer to get fixed, I should simply never exceed 50 mph?

 nice insult.

It wasn't an insult, it was an objective observation. You have made what I
believe to be factually incorrect claims about how ACL's work and are
implemented.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-03-02 Thread Paul B. Henson
On Tue, 2 Mar 2010, Kjetil Torgrim Homme wrote:

 you haven't demonstrated why the current capabilities are insufficient
 for your requirements.  it's a bit hard to offer advice for perceived
 problems other than reconsider your perception.

I think I've made it pretty clear that I want to control access by ACL, and
not allow attempts to manipulate legacy mode bits to change the ACL, and
that given current capabilities it's difficult to impossible to do so. It
also seems clear from your responses you don't think that's an appropriate
goal to be seeking. If you can offer alternative mechanisms within the
current capability set that can achive that goal I'd love to hear them; but
claims that I wouldn't have a problem if I didn't consider the current
state of affairs to be a problem aren't particularly useful.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large scale ZFS deployments out there (200 disks)

2010-03-01 Thread Paul B. Henson
On Sat, 27 Feb 2010, Jens Elkner wrote:

 At least on S10u8 its not that bad. Last time I patched and rebooted
 a X4500 with ~350 ZFS it took about 10min to come up, a X4600 with
 a 3510 and ~2350 ZFS took about 20min (almost all are shared via NFS).

Our x4500's with about 8000 filesystems per run about 50-60 minutes to shut
down, and about the same to boot up, resulting in about a 2 hour boot
cycle, which is kind of annoying. The lack of scalability is in the NFS
sharing, it only takes ~5 minutes to mount all 8000. I hope someday they'll
optimize that a bit...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-03-01 Thread Paul B. Henson
On Sun, 28 Feb 2010, Kjetil Torgrim Homme wrote:

 why are you doing this?  it's inherently insecure to rely on ACL's to
 restrict access.  do as David says and use ACL's to *grant* access.  if
 needed, set permission on the file to 000 and use umask 777.

Umm, it's inherently insecure to rely on Access Control Lists to, well,
control access? Doesn't that sound a bit off?

The only reason it's insecure is because the ACL's don't stand alone,
they're propped up on a legacy chmod interoperability house of cards which
frequently falls down.

 why is umask 022 when you want 077?  *that's* your problem.

What I want is for my inheritable ACL's not to be mixed in with legacy
concepts. ACL's don't have a umask. One of the benefits of inherited ACL's
is you don't need to globally pick 022, let people see what I'm up to vs
077, hide it all. You can just create files, with the confidence that
every one you create will have the appropriate permissions as configured.

Except, of course, when they're comingled with incompatible security
models. Basically, it sounds like you're arguing I shouldn't try to fix
ACL/chmod issues because ACL's are insecure because they have chmod issues
8-/.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-03-01 Thread Paul B. Henson
On Mon, 1 Mar 2010, Nicolas Williams wrote:

 Yes, that sounds useful.  (Group modebits could be applied to all ACEs
 that are neither owner@ nor everyone@ ACEs.)

That sounds an awful lot like the POSIX mask_obj, which was the bane of my
previous filesystem, DFS, and which, as it seems history repeats itself, I
was also unable to get an option implemented to ignore it and allow ACL's
to work without impediment.

 If users have private primary groups then you can have them run with
 umask 007 or 002 and use set-gid and/or inherittable ACLs to ensure that
 users can share files in specific directories.  (This is one reason that
 I recommend always giving users their own private primary groups.)

The only reason for the recommendation to give users their own private
primary groups is because of the lack of flexibility of the umask/mode bits
security model. In an environment with inheritable ACL's (that aren't
subject to being violated by that legacy security model) there's no real
need.

 Alternatively we could have a new mode bit to indicate that the group
 bits of umask are to be treated as zero, or maybe assign this behavior
 to the set-gid bit on ZFS.

So rather than a nice simple option granting ACL's immunity from umask/mode
bits baggage, another attempted mapping/interaction?

If you only ever access ZFS via CIFS from windows clients, you can have a
pure ACL model. Why should access via local shell or NFSv4 be a poor
stepchild and chained down with legacy semantics that make it exceedingly
difficult to actually use ACL's for their intended purpose?

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, Ian Collins wrote:

 One of my clients makes extensive use of ACLs.  Some of them are so
 complex, I had to write them an application to interpret and manage them!

Yah, manipulating them directly isn't for the faint of heart ;). But it's
not too hard to abstract them to a simpler interface.

 They have a user base of around 1000, with a couple of hundred (!)
 groups.  Nearly all file access is through Samba.

How did you keep Samba from whacking the ACL's with chmod? I couldn't find
a configuration where some part of it didn't chmod something at some point.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, Darren J Moffat wrote:

 Anyone sharing files over CIFS backed by ZFS is using ACLs, particularly
 when there are only Windows clients.  There are large number and some
 very significant in size deployments.

If you're running the opensolaris in-kernel CIFS server, you avoid the
POSIX compatibility layer and zfs does actually work in a pure ACL fashion.
OTOH, under Solaris 10, I was unable to find a samba configuration that
didn't result in some files being hit by a chmod and losing their ACL.

 I doubt it is something people tend to talk about or publish blogs etc
 on.  That is probably the main reason you can't find them.

It's not like I'm typing People who use ZFS ACL's into google and nothing
pops up, I'm inquiring in various forums generally populated by Solaris
using people, in which typically a Hey, who uses foo? post finds a fair
number of respondents. Given the dearth of responses, I can only conclude
their use is not very widespread. The most frequent response so far has
been along the lines of ACL's suck. I wish they weren't there 8-/.

So far it's been quite a struggle to deploy ACL's on an enterprise central
file services platform with access via multiple protocols and have them
actually be functional and reliable. I can see why the average consumer
might give up.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, Thomas Burgess wrote:

 I think most people are just confused by ACL's, i know i was when i first
 started using them.  Having said that, once i got them set correctly,
 they work very well for my CIFS shares.

Are you using the in-kernel CIFS server or samba? Are the files ever
accessed via NFS or local shell?


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, Nicolas Williams wrote:

 Can you describe your struggles?  What could we do to make it easier to
 use ACLs?  Is this about chmod [and so random apps] clobbering ACLs? or
 something more fundamental about ACLs?

I understand and accept that ACL's are complicated, and have no issues with
that. My current struggle is that other than in a few restricted use cases,
they can not be relied on to serve their purpose, as it is far to easy for
an accidental chmod (frequently in an unexpected and unnoticed context) to
wipe them out.

Even Solaris itself is guilty of such:


http://mail.opensolaris.org/pipermail/zfs-discuss/2010-February/037249.html

If you're trying to use ACL's in a general purpose deployment involving
access by applications which are ACL-ignorant, and over NFS to other
operating systems which might not even have ACL's themselves, I do not
believe there is any way with the current implementation to do so
successfully. Something is going to run chmod on a file or directory, and
the ACL will be broken.

I've already posited as to an approach that I think would make a pure-ACL
deployment possible:


http://mail.opensolaris.org/pipermail/zfs-discuss/2010-February/037206.html

Via this concept or something else, there needs to be a way to configure
ZFS to prevent the attempted manipulation of legacy permission mode bits
from breaking the security policy of the ACL.

If anyone has thoughts on a different approach that would achieve the same
goal, I'd love to hear about it. But I'm not sure how you could do that as
long as the ACL is so easily mangled.

Thanks...

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, Jason King wrote:

 Did you try adding:

nfs4: mode = special
vfs objects = zfsacl

 To the shares in smb.conf?  While we haven't done extensive work on
 S10, it appears to work well enough for our (limited) purposes (along
 with setting the acl properties to passthrough on the fs).

Yes, I've got that configuration. The ACL's are seen and manipulated from a
windows client fine. The problem is some samba occasionally chmod's stuff,
which breaks the ACL. I'm not clear on exactly the circumstances, but I was
unable to make it stop. We disabled unix extensions, and all of the dos
attributes to mode bits mappings, but it would still screw up ACL's as
things were copied or moved around.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, Bill Sommerfeld wrote:

 I believe this proposal is sound.

Mere words can not express the sheer joy with which I receive this opinion
from an @sun.com address ;).

 There are already per-filesystem tunables for ZFS which allow the system
 to escape the confines of POSIX (noatime, for one); I don't see why a
 chmod doesn't truncate acls option couldn't join it so long as it was
 off by default and left off while conformance tests were run.

It always frustrates me when cutting edge technology is artificially
hampered by the chains and straitjacket of an obsolete (or at least not
necessarily relevant to the problem at hand) standard. I had the same
problem with our previous DCE/DFS environment and the POSIX mask_obj.
Compliance with standards is good, but also having the option to knowingly
disregard them is even better :).

There are (as always) various pesky details that need to be ironed out. For
example, it should probably only apply to objects with a non-trivial ACL;
ones with a trivial ACL should still be chmod'able for compatibility.

There's also the question of what to do with the non-access-control pieces
of the legacy mode bits that have no ACL equivilent (suid, sgid, sticky
bit, et al). I think the only way to set those is with an absolute chmod,
so there'd be no way to manipulate them in the current implementation
without whacking the ACL. That's likely done relatively infrequently, those
bits could always be set before the ACL is applied. In our current
deployment the only one we use is sgid on directories, which is inherited,
not directly applied.

I was hoping to find some ZFS engineers that might be interested in tossing
the concept back and forth to the point where it was workable, but so far
no luck. It looks like you work more in the network security area? Ignoring
the zfs specific details, from an abstract security perspective, it seems
generally not good to be able to so easily and unintentionally subvert
explicitly configured security policy :(.

I've got an open case, SR#72456444, regarding chmod/ACL conflicts, if
anybody would like to help it along :).

Thanks...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, Nicolas Williams wrote:

 I believe we can do a bit better.

 A chmod that adds (see below) or removes one of r, w or x for owner is a
 simple ACL edit (the bit may turn into multiple ACE bits, but whatever)
 modifying / replacing / adding owner@ ACEs (if there is one).  A similar
 chmod that affecting group bits should probably apply to group@ ACEs.  A
 similar chmod that affecting other should apply to any everyone@ ACEs.

I don't necessarily think that's better; and I believe that's approximately
the behavior you can already get with aclmode=passthrough.

If something is trying to change permissions on an object with a
non-trivial ACL using chmod, I think it's safe to assume that's not what
the original user who configured the ACL wants. At least, that would be
safe to assume if the user had explicitly configured the hypothetical
aclmode=deny or aclmode=ignore :).

Take, for example, a problem I'm currently having on Linux clients mounting
ZFS over NFSv4. Linux supports NFSv4, and even has a utility to manipulate
NFSv4 ACL's that works ok (but isn't nearly as nice as the ACL integrated
chmod command in Solaris). However, the default behavior of the linux cp
command is to try and copy the mode bits along with the file. So, I copy
a file into zfs over the NFSv4 mount from some local location. The file is
created and inherits the explicitly configured ACL from the parent
directory; the cp command then does a chmod() on it and the ACL is broken.
That's not what I want, I configured that inheritable ACL for a reason, and
I want it respected regardless of the permissions of the file in its
original location.

Another instance is an application that doesn't seem to trust creat() and
umask to do the right thing, after creating a file it explicitly chmod's it
to match the permissions it thinks it should have had based on the
requested mode and the current umask. If the file inherited an explicitly
specified non-trivial ACL, there's really nothing that can be done about
that chmod, other than ignore or deny it, that will result in the
permissions intended by the user who configured the ACL.

 For set-uid/gid and the sticky bits being set/cleared on non-directories
 chmod should not affect the ACL at all.

Agreed.

 For directories the sticky and setgid bits may require editing the
 inherittable ACEs of the ACL.

Sticky bit yes; in fact, as it affects permissions I think I'd lump that in
to the ignore/deny category. sgid on directory though? That doesn't
explicitly affect permission, it just potentially changes the group
ownership of new files/directories. I suppose that indirectly affects
permissions, as the implicit group@ ACE would be applied to a different
group, but that's probably the intention of the person setting the sgid
bit, and I don't think any actual ACL entry changes should occur from it.

 chmod(2) always takes an absolute mode.  ZFS would have to reconstruct
 the relative change based on the previous mode...

Or perhaps some interface extension allowing relative changes to the
non-permission mode bits? For example, chown(2) allows you to specify -1
for either the user or group, meaning don't change that one. mode_t is
unsigned, so negative values won't work there, but there are a ton of
extra bits in an unsigned int not relevant to the mode, perhaps setting one
of them to signify only non permission related mode bits should be
manipulated:

chmod(foo, 012000) // turn on sgid bit
chmod(foo, 01) // turn off sgid bit
chmod(foo, 014000) // turn on suid bit
chmod(foo, 01) // turn off suid bit

 You should probably stop using the set-gid bit on directories and use
 inherttable ACLs instead...

Hmm, I suppose that could be implemented by using an explicit group: ACE
rather than the group@ ACE, but having the group ownership of the object
match and be expressed by group@ just seems a lot cleaner. ACL's don't get
rid of the concept of user and group ownership, and I don't think
the suid/sgid concept is going to get dropped anytime soon, so might as
well avail of it :).

But back to ACL/chmod; I don't think there's any way to map a permission
mode bits change via chmod to an ACL change that is guaranteed to be
acceptable to the creator of the ACL. I think there should be some form of
option available such that if an application is not ACL aware, it flat out
shouldn't be allowed to muck with permissions on an object with a
non-trivial ACL. In such a mode, only ACL operations should be allowed to
modify the permissions. They're really two separate security domains,
operations from one shouldn't be mixed with the other.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cmod(2) vs. ACLs (Re: Who is using ZFS ACL's in production?)

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, Nicolas Williams wrote:

 a) clobber the ACL;
 b) map the change as best you can to an ACL change;
 c) ignore the rwx bits in the mode mask (except on create from a POSIX
open(2)/creat(2), in which case the ACL has to be derived from the
initial mode);
 d) fail the chmod().

Option d I believe maps to my proposed aclmode=deny; option c I *think*
lines up with my aclmode=discard, and even takes care of the issue of
flipping the suid/sgid et al bits, as an absolute chmod of 0200 would turn
on sgid and the ugo parts would be ignored (and the ACL would only need to
be derived if the three special ACE's aren't specified by inheritance
(which they probably would be if somebody configured option c)).

a and b are both currently available, what do I need to do to get you on
board with implementing c and d ;)?

 All three can be surprising!

Agreed. There is no one, two, three, or even four different ways of
handling this issue that will meet the needs of every possible deployment.
But without getting into ridiculous levels of complexity, having some
reasonable number of options available seems highly desirable.

Evil thought -- implement a way to attach a custom chmod-ACL mapper to a
zfs filesystem allowing some basic scripting language to specify what
happens. Then everybody could make it do exactly what they want :).


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, Nicolas Williams wrote:

 Suppose you deny or ignore chmods.  Well, how would you ever set or reset
 set-uid/gid and sticky bits?  chmod(2) deals only in absolute modes, not
 relative changes, which means that in order to distinguish those bits
 from the rwx bits the filesystem would have to know the file's current
 mode bits in order to compare them to the new bits -- but this is hard
 (see my other e-mail in a new sub-thread).  You'd have to remove the ACL
 then chmod; oof.

You actually answered that in your previous email with option c. Ignore the
ugo bits of the argument to chmod, and only process the suid/sgid/sticky
bits. The filesystem does know the current mode bits when chmod is called,
doesn't it?

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/sys/zfs_znode.h

line 145, the zp_mode value in the znode_phys_t structure, labeled file
mode bits. At any given time, unless I'm mistaken, this value stores the
legacy mode bits for an object, distinct from and separate of the ACL.

It seems it would be fairly trivial to implement an aclmode which only
applied the suid/sgid/sticky bit part of the argument to chmod and ignored
the rest, leaving it as is.

 Can you make that utility avoid the chmod?  The mode bits should come
 from the open(2)/creat(2), and there should be no need to set them again
 after setting the ACL.

I think there is an option not to copy the mode bits. But it does by
default, and I don't really want to try and get every person on campus who
mounts their files via NFSv4 from a linux system to try and change the
default behavior of a base utility. And they might even want that behavior
on the local filesystem.

 Such an app is broken.

Yes it is. But even if I could fix it (assuming it's not a proprietary
binary), there would be another one after it. And then another. And
another. The only way to fully fix this issue for all possible instances of
bad defaults or broken applications is for the filesystem itself to enforce
it.

 But we'd have to extend NFSv4 and get the extension adopted and
 deployed.  There's no chance of such a change being made in a short
 period of time -- we're talking years.

No need; based on your other email and a little code digging I think the
ignore option could be implemented entirely within the zfs code, allowing
manipulation of suid/sgid without changing ugo bits, with no change in
behavior or interface required by anything else.

 But is an application that sets an ACL and chmods ACL-aware?  How can the
 filesystem tell?  (Answer: it can't really, as it may not be able to
 relate the two operations.)

My definition of an ACL-aware application is one that *never* tries to
manipulate legacy mode bits on an object with a non-trivial ACL. Based on
that definition, it's easy to tell :). And if an ACL aware application
wants to play with mode bits, first it should use the ACL API to set a
trivial ACL on the object, at which point chmod and mode bits would work
fine.

 As I wrote in that new sub-thread, I see no option that isn't surprising
 in some way.  My preference would be for what I labeled as option (b).

And I think you absolutely should be able to configure your fileserver to
implement your preference. Why shouldn't I be able to configure my
fileserver to implement mine :)?


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, Bill Sommerfeld wrote:

 acl-chmod interactions have been mishandled so badly in the past that i
 think a bit of experimentation with differing policies is in order.

I volunteer to help test discard and deny :). Heck, I volunteer to help
*implement* discard and deny...

 Based on the amount of wailing I see around acls, I think that, based on
 personal experience with both systems, AFS had it more or less right and
 POSIX got it more or less wrong -- once you step into the world of acls,
 the file mode should be mostly ignored, and an accidental chmod should
 *not* destroy carefully crafted acls.

We prototyped an AFS deployment for a while (it was the closest thing to
our existing DFS available). The location independence was great (I got
spoiled under DFS with the ability to transparently migrate data between
servers while in use), but the inability to apply an ACL to a file kind of
sucked. I guess you could have every file be in its own individual
subdirectory with the parent directory having a symlink to it to simulate
per-file ACL's, but talk about kludgy.

I'm actually much happier with our ZFS deployment (other than a couple of
ongoing unresolved scalability issues and this acl issue). But I can't
agree with you more that an undesired chmod should not destroy carefully
crafted acls. Now if I could only get a ZFS engineer to share that
viewpoint :).


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson

I was in the middle of a lengthy reply to this, which I've abandoned, as it
can pretty much be summarized as If you don't want this behavior, don't
enable it.

It wouldn't be the default, and if you didn't want it, you wouldn't enable
it. Perhaps it might be enabled on some system you inherit, but in that
case whoever originally turned it on must have wanted it. So you'd change
it to suit your needs. There's *lotso* stuff on a hand-me-down system
that's probably not configured the way you want :).

I'm not trying to force a particular behavior on anybody. I just want an
optional behavior available to meet the specific needs of my deployment.


On Fri, 26 Feb 2010, David Dyer-Bennet wrote:

 The problem with that, of course, is that it's equally true in a
 pure-permissions world -- if I'm trying to change the permissions with
 chmod, it's safe to assume that the new values aren't what the person
 who originally configured the protections on that file wanted.  THAT'S
 WHY I'M CHANGING THEM!

 So I don't see how that's a great argument for ignoring what I do.
[...]
 Okay, but the argument goes the other way just as well -- when I run
 chmod 6400 foobar, I want the permissions set that specific way, and I
 don't want some magic background feature blocking me.  Particulary if
 I am a complex system of scripts that wasn't even written locally.

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, David Dyer-Bennet wrote:

 So, even if you're willing to completely discard 30 years of legacy
 scripts and applications -- how to you propose that a NEW script or
 application should be written so as to work in this brave new
 environment?
[...]
 And how should new utilities be written to take the place of the 30
 years of work you're throwing out?  I don't yet see how it can be done.

First of all, you make a choice. Maybe the correct operation of some 30
year old script is most important to you. So you set an aclmode so it
works. But maybe making sure your sensitive data file doesn't get
accidentally exposed to the world via a unexpected hidden chmod in a 30
year old script is more important than that script working. So you set an
aclmode so your ACL doesn't get destroyed. It's your choice. Choice is
good.

Second, you're not necessarily discarding all of those legacy
scripts/applications. You're just making sure they don't screw up your
ACL's. Take the example of the editor that chmod's a file and you don't
want it to (but it's a binary app and you can't make it stop). Configuring
zfs to ignore the chmod doesn't break the application. The editor continues
to edit fine. It just doesn't destroy your ACL. Win-win.

If there's some app/script for which changing permissions are essential to
its operation, but it only understands mode bits, either the security
provided by mode bits is sufficient, so you configure aclmode so it works.
Or the security provided by mode bits isn't sufficient, so you replace the
app/script with one that understands ACLs. Using the published ACL API. man
-s 2 acl ;). You can claim it might be a lot of work, but I'm not sure how
you could claim it can't be done.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Who is using ZFS ACL's in production?

2010-02-25 Thread Paul B. Henson

I've been surveying various forums looking for other places using ZFS ACL's
in production to compare notes and see how if at all they've handled some
of the issues we've found deploying them.

So far, I haven't found anybody using them in any substantial way, let
alone trying to leverage them to allow a very large user population to have
highly flexible control over access to their data.

Anyone here that has a non-negligible ACL deployment that would be
interested in discussing it?

Thanks...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-25 Thread Paul B. Henson
On Thu, 25 Feb 2010, Marion Hakanson wrote:

 It's not easy to get them right, and usually the hardest task is in
 figuring out what the users want, so we don't use them unless the users'
 needs cannot be met using traditional Unix/POSIX permissions.

We've got a web GUI that hides the complexity from end users, pretty much
they see a list of files/directories, and pick users/groups with read only
or read/write. We offer access to central user/group space via CIFS
(currently samba in s10, with an eye towards opensolaris/in-kernel server),
web, scp/sftp, kerberized NFSv4, and interactive unix login. The web server
enforces ACL's, so users can restrict their web content by them. We've had
a similar environment based on DCE/DFS for over 10 years (we're just about
ready to shut that down, having completed migrating everything over) and
our users have become quite acclimatized to the flexibility that this gives
them to set access control once and have it respected regardless of access
protocol.

 and using inheritance to propagate them to any new items which are added
 to shared areas.

What protocols/access methods are used to get to the underlying zfs
filesystem? The main ACL problem we've having now (having resolved most of
them, yay) is interaction with chmod() and legacy mode bits, and the
disappointing ease with which an undesired chmod can completely destroy an
ACL. I finally ended up having to preload a shared library that disables
chmod() into samba, which resolved our issues for CIFS. I still haven't
found a way to keep users' ACL's from being wiped by rogue non-ACL aware
command/utilities (including, as it turns out, Solaris' own chgrp command).
Unfortunately, there's currently no global way to prevent manipulation of
legacy mode bits from destroying ACL's. I've been working around particular
instances of the problem (such as the preloaded shared library for samba,
or using chown :group rather than chgrp), but it's a losing battle.
There are 40 odd years of non-ACL aware stuff out there, it's intractable
to try and fix it all on a case by case basis. Given ZFS's description as
having a pure acl model, it really seems there should be some way to
prevent those ACL's from getting wiped out at the drop of a chmod.

 The scripting also (sorta) covers the problem that most backup and file
 transfer utilities are not capable of backing up and restoring the
 NFSv4-style ACL's on ZFS.

We have a central Netbackup deployment. Supposedly it supports ZFS ACL's,
although I've never actually tested that. I suppose I should at some point
8-/.

If we can't find some clean way to maintain ACL sanity, we might have to
start storing ACL's in metadata files (maintained in lockstep by our web
control app), and having a job run around and restore broken ACL's on
files/directories on an ongoing basis. That would be pretty kludgy :(, but
better than sensitive data suddenly becoming world readable because
somebody's favorite editor feels the need to chmod a file after
creation to match the permissions it thinks it should have had
based on the umask. Talk about a security deficiency sigh.

Thanks for the info...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] /usr/bin/chgrp destroys ACL's?

2010-02-11 Thread Paul B. Henson
On Wed, 10 Feb 2010, David Dyer-Bennet wrote:

 My experience with ACLs is that they suck dead diseased rats through a
 straw and I wish I could turn them off.

That seems overly harsh ;).

 What I would dearly love is an option to disable all ACL suppport.

If you never explicitly use ACL's on zfs, and only ever manipulate
permissions with legacy chmod mode bits, I believe zfs will behave like
ACL's don't exist. For what scenario is the existance of ACL's resulting in
failure for use cases that don't apply them?


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] /usr/bin/chgrp destroys ACL's?

2010-02-10 Thread Paul B. Henson

We have an open bug which results in new directories created over NFSv4
from a linux client having the wrong group ownership. While waiting for a
patch to resolve the issue, we have a script running hourly on the server
which finds directories owned by the wrong group and fixes them.

One of our users complained that the ACL's on some of their content were
broken, and upon investigation we determined /usr/bin/chgrp is breaking
them:

drwxrws--x+  2 root iit_webdev   2 Feb 10 16:29 testdir
owner@:rwxpdDaARWcC--:-di---:allow
owner@:rwxpdDaARWcC--:--:allow
group@:rwxpdDaARWc---:-di---:allow
group@:rwxpdDaARWc---:--:allow
group:iit_webdev-admin:rwxpdDaARWcC--:-di---:allow
group:iit_webdev-admin:rwxpdDaARWcC--:--:allow
 everyone@:--x---a-R-c---:-di---:allow
 everyone@:--x---a-R-c---:--:allow
owner@:rwxpdDaARWcC--:f-i---:allow
group@:rwxpdDaARWc---:f-i---:allow
group:iit_webdev-admin:rwxpdDaARWcC--:f-i---:allow
 everyone@:--:f-i---:allow

# chgrp iit testdir

drwxrws--x+  2 root iit2 Feb 10 16:29 testdir
owner@:rwxpdDaARWcC--:-di---:allow
owner@:dDaARWcC--:--:allow
group@:rwxpdDaARWc---:-di---:allow
group@:dDaARWc---:--:allow
group:iit_webdev-admin:rwxpdDaARWcC--:-di---:allow
group:iit_webdev-admin:rwxpdDaARWcC--:--:allow
 everyone@:--x---a-R-c---:-di---:allow
 everyone@:--a-R-c---:--:allow
owner@:rwxpdDaARWcC--:f-i---:allow
group@:rwxpdDaARWc---:f-i---:allow
group:iit_webdev-admin:rwxpdDaARWcC--:f-i---:allow
 everyone@:--:f-i---:allow
owner@:--:--:deny
owner@:rwxp---A-W-Co-:--:allow
group@:--:--:deny
group@:rwxp--:--:allow
 everyone@:rw-p---A-W-Co-:--:deny
 everyone@:--x---a-R-c--s:--:allow

Sure enough, per truss:

chmod(testdir, 02771) = 0

Looking at the chgrp man page:

 Unless chgrp  is  invoked  by  a  process  with  appropriate
 privileges, the set-user-ID and set-group-ID bits of a regu-
 lar file will be cleared  upon  successful  completion;  the
 set-user-ID and set-group-ID bits of other file types may be
 cleared.

Well, I'm running the chgrp as *root*, and it's not *clearing* the existing
setgid bit on the directory, it's *adding* it when it's already there. Why?
It seems completely unnecessary and **breaks the ACL**.

This is yet another instance of the general problem I posted about
yesterday:

http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg34588.html

to which I have so far received no comments (Dozens of people can spend
over a week arguing about the cost effectiveness of Sun branded storage ;),
and not a single person is interested in an endemic ACL problem?).

I was completely unsuccessful at getting samba under Solaris 10 to stop
gratuitously chmod()'ing stuff, so I ended up preloading a shared library
overriding the chmod call with a noop. Which works perfectly, and results
in exactly the behavior I need. But it's not really feasible to run around
and tweak every little binary around (preload a shared library to stop
chgrp from breaking ACL's too?), which is why I think it would be an
excellent feature to let the underlying operating system deal with it --
hence aclmode ignore/deny...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] /usr/bin/chgrp destroys ACL's?

2010-02-10 Thread Paul B. Henson
On Wed, 10 Feb 2010, Jason King wrote:

 I suspect that zfs is interpreting the group ACLs and adjusting the mode
 value accordingly to try to indicate the 'preserve owner/group on new
 file' semantics with the old permissions, however it sounds like it's not
 a symmetric operation -- if chgrp sees a directory with suid or sgid set,
 it does chown(file, original_mode  ~(S_IFMT)), when it should probably
 be more careful if ACLs are present.

The non-access control parts of the mode bits (suid/sgid et al) aren't part
of the acl (and can't be represented by one), I haven't looked at the
internal details, but even though zfs is a pure acl filesystem those bits
are presumably stored differently. In an implementation like I'm looking
for, where legacy chmod is either ignored or denied for access control,
manipulating the other parts would still need to work.

In this case, it's not even like chgrp is trying to remove the sgid bit.
It's redundantly adding it when it's already there. I've opened a bug;
while I haven't gotten a response yet I'm guessing they'll eventually fix
it. Which is fine for that specific instance of the general unexpected
chmod assassinates acl, film at 11 problem, it doesn't address all of the
other unknown problems not yet noticed. The only way to effectively address
undesirable acl destruction by chmod is at the filesystem level, by
allowing the user to disable it.

 I do think the default aclmode and aclinherit settings are unintuitive
 and quite surprising (I'd almost argue flat out wrong).

I actually wouldn't necessarily agree; the defaults are geared towards
making zfs behave like a typical unix filesystem, preserving POSIX
compliance and making sure legacy apps work as expected. That's fine as a
default, but the ability to make zfs actually functional as a pure acl
environment should be available as an option.

 I've found setting aclmode and aclinherit to passthrough saves what
 little hair I have left.  If you haven't tried that already, might want
 to see if that helps any.

I did that from day 1 :)... I'm happy at this point with aclinherit
(barring one minor tweak to passthrough-x that's already had an approved
fasttrack and is just waiting for opensolaris 3/10 to get released and the
gate reopened for changes), if I could just get some aclmode enhancements I
think we'd be set.

 My experience (perhaps others will have different experiences) is that
 due to the added complexity and administrative overhead, ACLs are used
 when it's absolutely necessary -- i.e. you have something that due to
 it's nature must have very explicit and precise access control.

NFSv4 ACL's are ridiculously overcomplicated for the end user; but that's
what pretty web gui's are for ;). We actually have one of those, allowing
end users to easily control access to their files, and it would be working
great, other than that every time I turn around the ACL's are broken by
something or another :(.

 The last thing I want is the system going behind my back and silently
 modifying the permissions I'm trying to set, and leaving directories and
 files with permissions other than what was set (which is what you get
 today with the defaults).

Ditto. I put that acl there for a reason, and I don't want anything
trying to treat it like legacy mode bits.

zfs is a great filesystem, and the CIFS compatible NFSv4 ACL's provide
great potential for cross platform interoperability, but despite being a
pure acl filesystem I don't think there's any feasible way at this point to
allow access via NFSv4 or local shell and not break acl's one way or
another.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] alternative aclmode options -- ignore? deny?

2010-02-09 Thread Paul B. Henson

I was having a conversation a few weeks ago about the various possible ways
of handling a chmod() on a file/directory with an acl. I suggested that one
option be for it just be ignored, as the file presumably had that acl for a
reason and the creator probably doesn't want it indiscriminitely whacked.
The feedback was that the internal Sun POSIX compliance police wouldn't
like that ;).

But you know, every day I run across yet another annoying legacy script or
application or client that has no idea what an acl is and screws up my
carefully crafted permissions. I'm finding it virtually impossible to
maintain any level of sanity, and short of sweeping through all my files on
a regular basis and checking/fixing acl's, it seems there's no way to
realisticly maintain them.

So why not a option to either ignore or fail any attempts to chmod() a
file/directory with a non-trivial acl? Obviously this wouldn't be the
default, and likely unwise for a root pool ;), but if someone wants an acl
pure dataset why not give it to them? The ability to be POSIX compliant is
clearly required, and to be that way by default is perhaps best, but on the
other hand providing the user features that they need should rank pretty
high on the importance scale too.

So I'd propose two additional aclmode options:

* ignore -- any chmod() attempt on a file/directory with a non-trivial acl
is silently ignored and returns success

* deny -- any chmod() attempt on a file/directory with a non-trivial acl
  fails with EPERM.

You can already achieve acl-pure datasets under opensolaris if you only
access them via cifs; why should nfs or local shell access not be able to
have the same functionality?


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 failed disk, not sure if hot spare took over correctly

2010-01-11 Thread Paul B. Henson
On Mon, 11 Jan 2010, Eric Schrock wrote:

 No, there is no way to tell if a pool has DTL (dirty time log) entries.

Hmm, I hadn't heard that term before, but based on a quick search I take it
that's the list of data in the pool that is not fully redundant? So if a
2-way mirror vdev lost a half, everything written after the loss would be
on the DTL, and if the same device came back, recovery would entail just
running through the DTL and writing out what it missed? Although presumably
if the failed device was replaced with another device entirely all of the
data would need to be written out.

I'm not quite sure that answered my question. My original question was, for
example, given a 2-way mirror, one half fails. There is a hot spare
available, which is pulled in, and while the pool isn't optimal, it does
have the same number of devices that it's supposed to. On the other hand,
the same mirror loses a device, there's no hot spare, and the pool is short
one device. My understanding is that in both scenarios the pool status
would be DEGRADED, but it seems there's an important difference. In the
first case, another device could fail, and the pool would still be ok. In
the second, another device failing would result in complete loss of data.

While you can tell the difference between these two different states by
looking at the detailed output and seeing if a hot spare is in use, I was
just saying that it would be nice for the short status to have some
distinction between device failed, hot spare in use and device failed,
keep fingers crossed ;).

Back to your answer, if the existance of DTL entries means the pool doesn't
have full redundancy for some data, and you can't tell if a pool has DTL
entries, are you saying there's no way to tell if the current state of your
pool could survive a device failure? If a resilver successfully completes,
barring another device failure, doesn't that mean the pool is restored to
full redundancy? I feel like I must be misunderstanding something :(.

Thanks...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] x4500 failed disk, not sure if hot spare took over correctly

2010-01-09 Thread Paul B. Henson

We just had our first x4500 disk failure (which of course had to happen
late Friday night sigh), I've opened a ticket on it but don't expect a
response until Monday so was hoping to verify the hot spare took over
correctly and we still have redundancy pending device replacement.

This is an S10U6 box:

SunOS cartman 5.10 Generic_141445-09 i86pc i386 i86pc

Looks like the first errors started yesterday morning:

Jan  8 07:46:02 cartman marvell88sx: [ID 268337 kern.warning] WARNING:
marvell88
sx1:device on port 2 failed to reset
Jan  8 07:46:15 cartman marvell88sx: [ID 268337 kern.warning] WARNING:
marvell88
sx1:device on port 2 failed to reset
Jan  8 07:46:32 cartman sata: [ID 801593 kern.warning] WARNING:
/p...@0,0/pci1022
,7...@2/pci11ab,1...@1:
Jan  8 07:46:32 cartman SATA device at port 2 - device failed
Jan  8 07:46:32 cartman scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci1022
,7...@2/pci11ab,1...@1/d...@2,0 (sd26):
Jan  8 07:46:32 cartman Command failed to complete...Device is gone

ZFS failed the drive about 11:15PM:

Jan  8 23:15:01 cartman zpool_check[3702]: [ID 702911 daemon.error] zpool
export
 status: One or more devices has experienced an unrecoverable error.  An
Jan  8 23:15:01 cartman zpool_check[3702]: [ID 702911 daemon.error] zpool
export
 status: attempt was made to correct the error.  Applications are
unaffected.
Jan  8 23:15:01 cartman zpool_check[3702]: [ID 702911 daemon.error] unknown
head
er see
Jan  8 23:15:01 cartman zpool_check[3702]: [ID 702911 daemon.error]
warning: poo
l export health DEGRADED

However, the errors continue still:

Jan  9 03:54:48 cartman scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci1022
,7...@2/pci11ab,1...@1/d...@2,0 (sd26):
Jan  9 03:54:48 cartman Command failed to complete...Device is gone
[...]
Jan  9 07:56:12 cartman scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci1022
,7...@2/pci11ab,1...@1/d...@2,0 (sd26):
Jan  9 07:56:12 cartman Command failed to complete...Device is gone
Jan  9 07:56:12 cartman scsi: [ID 107833 kern.warning] WARNING:
/p...@0,0/pci1022
,7...@2/pci11ab,1...@1/d...@2,0 (sd26):
Jan  9 07:56:12 cartman drive offline

If ZFS removed the drive from the pool, why does the system keep
complaining about it? Is fault management stuff still poking at it?

Here's the zpool status output:

  pool: export
 state: DEGRADED
[...]
 scrub: scrub completed after 0h6m with 0 errors on Fri Jan  8 23:21:31
2010


NAME  STATE READ WRITE CKSUM
exportDEGRADED 0 0 0

  mirror  DEGRADED 0 0 0
c0t2d0ONLINE   0 0 0
spare DEGRADED 18.9K 0 0
  c1t2d0  REMOVED  0 0 0
  c5t0d0  ONLINE   0 0 18.9K

spares
  c5t0d0  INUSE currently in use

Is the pool/mirror/spare still supposed to show up as degraded after the
hot spare is deployed?

There are 18.9K checksum errors on the disk that failed, but there are also
18.9K read errors on the hot spare?

The scrub started at 11pm last night, the disk got booted at 11:15pm,
presumably the scrub came across the failures the os had been reporting.
The last scrub status shows that scrub completing successfully. What
happened to the resilver status? How can I tell if the resilver was
successful? Did the resilver start and complete while the scrub was still
running and its status output was lost? Is there any way to see the status
of past scrubs/resilvers, or is only the most recent one available?

Fault managment doesn't report any problems:

r...@cartman ~ # fmdump
TIME UUID SUNW-MSG-ID
fmdump: /var/fm/fmd/fltlog is empty

Shouldn't this show a failed disk?

fmdump -e shows tuns of bad stuff:

Jan 08 07:46:32.9467 ereport.fs.zfs.probe_failure
Jan 08 07:46:36.2015 ereport.fs.zfs.io
[...]
Jan 08 07:51:05.1865 ereport.fs.zfs.io

None of that results in a fault diagnosys?

Mostly I'd like to verify my hot spare is working correctly. Given the
spare status is degraded, the read errors on the spare device, and the
lack of successful resilver status output, it seems like the spare might
not have been added successfully.

Thanks for any input you might provide...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 failed disk, not sure if hot spare took over correctly

2010-01-09 Thread Paul B. Henson
On Sat, 9 Jan 2010, Eric Schrock wrote:

  If ZFS removed the drive from the pool, why does the system keep
  complaining about it?

 It's not failing in the sense that it's returning I/O errors, but it's
 flaky, so it's attaching and detaching.  Most likely it decided to attach
 again and then you got transport errors.

Ok, how do I make it stop logging messages about the drive until it is
replaced? It's still filling up the logs with the same errors about the
drive being offline.

Looks like hdadm isn't it:

r...@cartman ~ # hdadm offline disk c1t2d0
/usr/bin/hdadm[1762]: /dev/rdsk/c1t2d0d0p0: cannot open
/dev/rdsk/c1t2d0d0p0 is not available

Hmm, I was able to unconfigure it with cfgadm:

r...@cartman ~ # cfgadm -c unconfigure sata1/2::dsk/c1t2d0

It went from:

sata1/2::dsk/c1t2d0disk connectedconfigured   failed

to:

sata1/2disk connectedunconfigured failed

Hopefully that will stop the errors until it's replaced and not break
anything else :).

 No, it's fine.  DEGRADED just means the pool is not operating at the
 ideal state.  By definition a hot spare is always DEGRADED.  As long as
 the spare itself is ONLINE it's fine.

The spare shows as INUSE, but I'm guessing that's fine too.

 Hope that helps

That was perfect, thank you very much for the review. Now I can not worry
about it until Monday :).

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] CR#6850837 libshare enhancements to address performance and scalability

2009-12-09 Thread Paul B. Henson

I've had a case open for a year or so now regarding the inefficiencies of
having a large number of zfs filesystems, in particular how long it takes
to share/unshare them (resulting in a reboot cycle time on my x4500 with
8000 file systems of over two hours).

I got an update indicating that the subject mentioned bugfix was going to
resolve this, but that they did not plan to back port it to Solaris 10.
They also were not able to provide any technical details of what was fixed
or how much the performance might improve.

Would anyone happen to have any details of what changes were made, what
kind of improvements might be expected, and why it's not going to be
feasible to backport that change to Solaris 10?

Thanks...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CR6894234 -- improved sgid directory compatibility with non-Solaris NFS clients

2009-11-09 Thread Paul B. Henson
On Fri, 6 Nov 2009, James Andrewartha wrote:

 How about attacking it the other way? Sign the SCA, get a sponsor and put
 the fix into OpenSolaris, then sustaining just have to backport it.
 http://hub.opensolaris.org/bin/view/Main/participate

Do you mean the samba bug or the NFS bug?

For the samba bug, I've already submitted a patch to fix the problem.

For the NFS bug, while I have in the past pursued such options with
open-source software, considering Solaris 10 is a commercial product for
which we're paying a fairly substantial cost on for support, I'd really
prefer they fix it themselves...

 Also, since you know it's a NFS server issue now, have you tried asking
 on nfs-discuss?

Yup:

http://opensolaris.org/jive/thread.jspa?messageID=430745

No responses...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CR6894234 -- improved sgid directory compatibility with non-Solaris NFS clients

2009-11-05 Thread Paul B. Henson
On Thu, 5 Nov 2009, Miles Nordin wrote:

 allowing the first local patch into your site? or you are running a
 closed-source release where you have to roll over and beg for support?

We're running Solaris 10. It does seem like I spend an undue amount of time
lately dealing with Sun support, I have another open issue
(http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg30169.html)
that's lasted over two months, involved multiple high-level support
managers, and probably already cost the company considerably more resources
thrashing than just applying the patch I already provided to fix the bug
would have.

As far as this NFS issue, when we initially reported the problem (which
occurred with NFSv4), they claimed NFSv3 had the exact same behavior and
since v4 worked like v3 it wasn't a bug. We didn't actually verify that,
but at a later point a Red Hat support engineer indicated it worked
correctly for him on a Solaris 10 server under v3, and only v4 was broken.
We set up a v3 test and verified that was the case, which we've now pointed
out on the support ticket, and are hoping it will actually get fixed now.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CR6894234 -- improved sgid directory compatibility with non-Solaris NFS clients

2009-11-03 Thread Paul B. Henson
On Tue, 3 Nov 2009, Ross Walker wrote:

 Maybe this isn't an interoperability fix, but a security fix as it allows
 non-Sun clients to bypass security restrictions placed on a sgid
 protected directory tree because it doesn't properly test the existence
 of that bit upon file creation.

 If an appropriate scenario can be made, and I'm sure it can, one might
 even post a CERT advisory to make sure operators are made aware of this
 potential security problem.

I agree it's a security issue, I think I mentioned that at some point in
this thread. However, it doesn't allow a client to do something they
couldn't do anyway. If the sgid bit was respected and the directory was
created with the right group, the client could chgrp it to their primary
group afterwards. The security issue isn't that an evil client will avail
of this to end up with a directory owned by the wrong group, it's that a
poor innocent client will end up with a directory owned by their primary
group rather than the group of the parent directory, and any inherited
group@ ACL will apply to the primary group, resulting in insecure and
unintended access :(.

Another possible security issue that came up while I was discussing this
issue with one of the Linux NFSv4 developers is that relying upon the
client to set the ownership of the directory results in a race condition
and is in their opinion buggy.

In between the time the client generates the mkdir request and sends it
over the wire and the server receives it, someone else might have changed
the permissions or group ownership of the parent directory, resulting in
the explicitly specified group provided by the client being wrong. They
refuse to implement this buggy behavior, and to quote them, You should get
Sun to fix their server.

I'm trying to do that, but no luck so far sigh...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CR6894234 -- improved sgid directory compatibility with non-Solaris NFS clients

2009-11-02 Thread Paul B. Henson
On Thu, 29 Oct 2009 casper@sun.com wrote:

 Do you have the complete NFS trace output?  My reading of the source code
 says that the file will be created with the proper gid so I am actually
 believing that the client over corrects the attributes after creating
 the file/directory.

Just wondering if you had a chance to look at the packet capture I sent and
the pointers to the Solaris source code that appear to be causing the
problem that results in ignoring the sgid bits on directory creations over
NFS.

The feedback I'm getting from sustaining on my support request is that they
don't think it's broken and they're not inclined to fix it. Even if the
spec doesn't explicitly define the behavior, respecting the sgid bit on
directory creation still seems like the right thing to do. If you agree,
perhaps you could use your considerable influence to try and improve
interoperability ;)? Or perhaps put me in touch with someone in forward
development, or someone in charge of attending NFS interoperability
bakeoffs, that might be more interested in improvements?

Thanks...

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CR6894234 -- improved sgid directory compatibility with non-Solaris NFS clients

2009-11-02 Thread Paul B. Henson
On Sat, 31 Oct 2009, Al Hopper wrote:

 Kudos to you - nice technical analysis and presentation, Keep lobbying
 your point of view - I think interoperability should win out if it comes
 down to an arbitrary decision.

Thanks; but so far that doesn't look promising. Right now I've got a cron
job running every hour on the backend servers crawling around and fixing
permissions on new directories :(.

You would have thought something like this would have been noticed in one
of the NFS interoperability bake offs.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CR6894234 -- improved sgid directory compatibility with non-Solaris NFS clients

2009-10-30 Thread Paul B. Henson
On Fri, 30 Oct 2009, Darren J Moffat wrote:

 Have you tried using different values for the per dataset aclinherit or
 aclmode properties ?

We have aclmode set to passthrough and aclinherit to passthrough-x (thanks
again Mark!). We haven't tried anything else.

 I'm not sure they will help you much but I was curious if you had looked
 at this area for help.

If you saw the message I sent late yesterday, I found the code in the nfs
server which explicitly sets the group owner if one is not specified by the
client, so I don't think at the filesystem level it has much choice, it's
being told explicitly which group the new directory should be owned by.

Thanks...

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] CR6894234 -- improved sgid directory compatibility with non-Solaris NFS clients

2009-10-29 Thread Paul B. Henson

I posted a little while back about a problem we are having where when a
new directory gets created over NFS on a Solaris NFS server from a Linux
NFS client, the new directory group ownership is that of the primary group
of the process, even if the parent directory has the sgid bit set and is
owned by a different group.

Basically, a Solaris client in such an instance explicitly requests that
the new directory be owned by the group of the parent directory, and the
server follows that request. A Linux NFS client, on the other hand, does
not explicitly request any particular group ownership for the new
directory, leaving the server to decide that on its own, which in the case
of the Solaris server, is not the right group.

The POSIX spec on this is somewhat ambiguous, so you can't really say the
Solaris implementation is broken, but while perhaps following the letter
of the spec, I don't think it's following the spirit of the sgid bit on
directories.

I have a CR, #6894234, which is currently being reviewed through Sun
support. It seems their current inclination is to not change the behavior.

Again, while not technically broken, I would argue this behavior is
undesirable. The semantics of the sgid bit on directories are that new
subdirectories should be owned by the group of the parent directory. That's
what happens under Solaris for local file system access. That's what
happens under Solaris if a directory is made via NFS from a Solaris NFS
client. It's not what happens when a new directory is created via NFS from
a Linux NFS client, or any other NFS client that does not explicitly
request the group ownership when creating a directory. While POSIX does not
explicitly specify what a server should do when creating a new directory
and the client does not specify the group ownership, in the case where the
new directory resides in an existing directory with the sgid bit set,
following standard sgid bit directory group ownership semantics seems the
most appropriate thing to do.

If any Sun engineers with an interest in improved interoperability and
keeping true to the spirit of the sgid bit could take a look at this CR and
weigh in on its final resolution, that would be greatly appreciated.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CR6894234 -- improved sgid directory compatibility with non-Solaris NFS clients

2009-10-29 Thread Paul B. Henson
On Thu, 29 Oct 2009 casper@sun.com wrote:

 Do you have the complete NFS trace output?  My reading of the source code
 says that the file will be created with the proper gid so I am actually
 believing that the client over corrects the attributes after creating
 the file/directory.

Yes, we submitted that to support. It's SR#71757154, although I don't know
if they've kept the ticket kept up-to-date. My understanding of the current
status is that they have verified the behavior we describe, and given the
ambiguity of the POSIX spec are not necessarily inclined to change it.

I've attached a small packet capture from creating a subdirectory on a
Solaris 10U8 NFS server from both a Linux client and a Solaris client.

For the linux client:

--
hen...@damien /mnt/sgid_test $ ls -ld .
drwx--s--x 2 henson iit 2 Oct 29 17:29 .

hen...@damien /mnt/sgid_test $ id
uid=1005(henson) gid=1012(csupomona)

hen...@damien /mnt/sgid_test $ mkdir linux

hen...@damien /mnt/sgid_test $ ls -l
total 2
drwx--s--x 2 henson csupomona 2 Oct 29 17:31 linux
--

The mkdir operation appears to consist of the compound call
PUTFH;SAVEFH;CREATE;GETFH;GETATTR;RESTOREFH;GETATTR; the CREATE call
specifies an attrmask of just FATTR4_MODE. The response to the GETATTR call
shows the FATTR4_OWNER_GROUP to be csupomona.

For the Solaris client:

--
hen...@s10 /mnt/sgid_test $ ls -ld .
drwx--s--x+  3 henson   iit3 Oct 29 17:31 .

hen...@s10 /mnt/sgid_test $ id
uid=1005(henson) gid=1012(csupomona)

hen...@s10 /mnt/sgid_test $ mkdir solaris

hen...@s10 /mnt/sgid_test $ ls -l
total 4
drwx--s--x+  2 henson   iit2 Oct 29 17:33 solaris
--

The mkdir in this case consists of the compound call
PUTFH;CREATE;GETFH;GETATTR;SAVEFH;PUTFH;GETATTR;RESTOREFH;NVERIFY;SETATTR,
the CREATE call specifies an attrmask of both FATTR4_MODE *and*
FATTR4_OWNER_GROUP with iit as the group.

In the reply to GETATTR, FATTR4_OWNER_GROUP is iit.


We don't see any evidence that the Linux client explicitly changes the
group ownership after the directory is made. If I might inquire, which
source code are you looking at? Is it available though the OpenSolaris
online source browser? If so, could I trouble you for a link to it?

Thanks much for any help you might provide in clarifying this issue, and if
our understanding of the behavior turns out to be accurate, any help in
getting a change committed to better respect the sgid bit :)...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768

linux_mkdir.pcap
Description: Binary data


solaris_mkdir.pcap
Description: Binary data
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CR6894234 -- improved sgid directory compatibility with non-Solaris NFS clients

2009-10-29 Thread Paul B. Henson
On Thu, 29 Oct 2009 casper@sun.com wrote:

 Do you have the complete NFS trace output?  My reading of the source code
 says that the file will be created with the proper gid so I am actually
 believing that the client over corrects the attributes after creating
 the file/directory.

I dug around the OpenSolaris source code and believe I found where this
behavior is coming from.

In

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/nfs/nfs4_srv.c

on line 1643, there's a comment:

Set default initial values for attributes when not specified in
createattrs.

And if the uid/gid is not explicitly specified in the NFS CREATE operation,
the code calls crgetuid and crgetgid to determine what uid/gid to use for
the mkdir operation. crgetgid is just return (cr-cr_gid);, which would
result in the behavior we describe -- if there is no group owner explicitly
specified, new subdirectories are always created based on the primary group
of the user, disregarding the presence of any sgid bit on the parent
directory.

As far as the client:

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/nfs/nfs4_vnops.c

On line 6790 the client code explicitly checks whether or not the new
directory is being created inside of a parent directory with a sgid bit
set, and then explicitly includes the group owner if so.

I'm guessing you are probably looking at the actual underlying filesystem
code? That probably does do the right thing if the gid is not specified.
But given the NFS server code, if no gid is specified by the client,
explicitly uses the primary gid, by the time it gets to the underlying file
system the gid is already specified and any filesystem level sgid handling
is bypassed.

I doubt if the resolution to the problem is as simple as not having the NFS
server code explicitly specify a gid if none is given by the client,
allowing the underlying filesystem to do the right thing, but who knows
:)... I still think that the preferred behavior would be to respect the
sgid bit semantics, and continue to hope I can convince the engineers in
charge of this decision to agree.

Thanks...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] importing pool with missing/failed log device

2009-10-22 Thread Paul B. Henson
On Thu, 22 Oct 2009, Victor Latushkin wrote:

 CR 6343667 synopsis is scrub/resilver has to start over when a snapshot is 
 taken:
 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6343667

 so I do not see how it can be related to log removal.
 Could you please check bug number in question?

Ack, my bad, too many open cases :(, sorry. The correct bug for the
inquiry is CR 6707530.

Thanks...

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] importing pool with missing/failed log device

2009-10-21 Thread Paul B. Henson

I've had a case open for a while (SR #66210171) regarding the inability to
import a pool whose log device failed while the pool was off line.

I was told this was CR #6343667, which was supposedly fixed in patches
141444-09/141445-09. However, I recently upgraded a system to U8 which
includes that kernel patch, and still am unable to import a pool with a
failed log device:

r...@ike ~ # zpool import
  pool: export
id: 4066329346842580031
 state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-6X
config:

export  UNAVAIL  missing device
  mirrorONLINE
c0t0d0  ONLINE
c1t0d0  ONLINE
[...]
Additional devices are known to be part of this pool, though their
exact configuration cannot be determined.

I have not as yet updated the pool to the new version included in U8, but I
was not told that was a prerequisite to availing of the fix.

Is this issue supposed to have been fixed by that CR, or did that resolve
some other issue and I was misinformed on my support ticket?

Any information appreciated, thanks...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-21 Thread Paul B. Henson
On Tue, 20 Oct 2009, [UTF-8] Fr??d??ric VANNIERE wrote:

 You can't use the Intel X25-E because it has a 32 or 64 MB volatile cache
 that can't be disabled neither flushed by ZFS.

Say what? My understanding is that the officially supported Sun SSD for the
x4540 is an OEM'd Intel X25-E, so I don't see how it could not be a good
slog device.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Liveupgrade'd to U8 and now can't boot previous U6 BE :(

2009-10-19 Thread Paul B. Henson
On Sat, 17 Oct 2009, dick hoogendijk wrote:

 It's a bootblock issue. If you really want to get back to u6 you have to
 installgrub /boot/grub/stage1 /boot/grub/stage2 from th update 6 image
 so mount it (with lumount or easier, with zfs mount) and make sure you
 take the stage1 stage2 from this update. ***WARNING*** adter doing so,
 you're u6 will boot, but you're u8 will not. In activating update 8 all
 GRUB items are synced. That way all BE's are bootable. That's the way
 it's supposed to be. Maybe something went wrong and only the new u8 BE
 has the understanding of the new bootblocks.

I restored the U6 grub, and sure enough, I was able to boot my U6 BE again.
However, I was also still able to boot the U8 BE. Thanks much, I'll pass
this info on to my open support ticket and see what they have to say.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] s10u8: lots of fixes, any commentary?

2009-10-16 Thread Paul B. Henson
On Thu, 15 Oct 2009, Enda O'Connor wrote:

 if you have a separate /var dataset on zfs root then Lu in update 8 ( or
 using latest 121430-42/121431-43 ) is broke. this is covered in CR
 6884728

I don't see that bugid available in bugs.opensolaris.org, is there any
place I can find more details on that?


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Strange problem with liveupgrade on zfs (10u7 and u8)

2009-10-16 Thread Paul B. Henson
On Thu, 15 Oct 2009, Enda O'Connor wrote:

 This is 6884728 which is a regression from 6837400.
 the workaround is as you done, remove the lines from vfstab

Oh, ok, this is the problem described in CR 6884728? Disregard my earlier
inquiry for more details on that CR then.

I ran into exactly the same problem :(. There are no additional issues
other than simply removing the lines from vfstab? I have an open service
request but they haven't responded yet.

I actually did that, and booted into the U8 boot environment, and then
decided to revert to my previous U6 in case there were any other issues;
unfortunately, now the system won't boot at all :(, a panic message flashes
shortly on the screen but I don't have enough time to see what it is. And
the stinking java remote console for the x4500 seems to be broken and is
not passing keystrokes through. I haven't had the time yet to physically
visit the data center and plug a physical keyboard in to try to boot into
failsafe to resolve it.

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Liveupgrade'd to U8 and now can't boot previous U6 BE :(

2009-10-16 Thread Paul B. Henson

I used live upgrade to update a U6+lots'o'patches system to vanilla U8. I
ran across CR 6884728, which results in extraneous lines in vfstab
preventing successful boot. I logged in with maintainence mode and deleted
those lines, and the U8 BE came up ok. I wasn't sure if there were any
other problems from that, so I tried to activate and boot back into my
previous U6 BE. That now fails with this error:

  ***
  *  This device is not bootable!   *
  *  It is either offlined or detached or faulted.  *
  *  Please try to boot from a different device.*
  ***


NOTICE:
spa_import_rootpool: error 22

Cannot mount root on /p...@1,0/pci1022,7...@4/pci11ab,1...@1/d...@0,0:a
fstype zfs

panic[cpu0]/thread=fbc283a0: vfs_mountroot: cannot mount root

I can still boot fine into the new U8 BE, but so far have found no way to
recover and boot into my previously existing U6 BE.

I booted both BE's in verbose mode, the working one:

SunOS Release 5.10 Version Generic_141445-09 64-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
[...]
sd44 at marvell88sx3: target 7 lun 0
sd44 is /p...@1,0/pci1022,7...@4/pci11ab,1...@1/d...@7,0
/p...@1,0/pci1022,7...@4/pci11ab,1...@1/d...@7,0 (sd44) online
root on ospool/ROOT/s10u8 fstype zfs

and the failing one:

SunOS Release 5.10 Version Generic_141415-10 64-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
[...]
sd44 at marvell88sx3: target 7 lun 0
sd44 is /p...@1,0/pci1022,7...@4/pci11ab,1...@1/d...@7,0
/p...@1,0/pci1022,7...@4/pci11ab,1...@1/d...@7,0 (sd44) online

NOTICE:
spa_import_rootpool: error 22

Cannot mount root on /p...@1,0/pci1022,7...@4/pci11ab,1...@1/d...@0,0:a
fstype zfs



/p...@1,0/pci1022,7...@4/pci11ab,1...@1/d...@0,0:a is c3t0d0, which is part
of my root pool:

NAME  STATE READ WRITE CKSUM
ospoolONLINE   0 0 0
  mirror  ONLINE   0 0 0
c3t0d0s0  ONLINE   0 0 0
c3t4d0s0  ONLINE   0 0 0


Any idea what's going on? Why is the U6 BE trying to mount a disk partition
instead of the appropriate zfs filesystem? Here's the grub config if that
helps:

#- patch-20090907 - ADDED BY LIVE UPGRADE - DO NOT EDIT  -

title patch-20090907
findroot (BE_patch-20090907,0,a)
bootfs ospool/ROOT/patch-20090907
kernel$ /platform/i86pc/multiboot -B $ZFS-BOOTFS
module /platform/i86pc/boot_archive

title patch-20090907 failsafe
findroot (BE_patch-20090907,0,a)
bootfs ospool/ROOT/patch-20090907
kernel /boot/multiboot -s
module /boot/x86.miniroot-safe

#- patch-20090907 -- END LIVE UPGRADE 
#- s10u8 - ADDED BY LIVE UPGRADE - DO NOT EDIT  -

title s10u8
findroot (BE_s10u8,0,a)
bootfs ospool/ROOT/s10u8
kernel$ /platform/i86pc/multiboot -B $ZFS-BOOTFS
module /platform/i86pc/boot_archive

title s10u8 failsafe
findroot (BE_s10u8,0,a)
bootfs ospool/ROOT/s10u8
kernel /boot/multiboot -s
module /boot/amd64/x86.miniroot-safe

#- s10u8 -- END LIVE UPGRADE 


Thanks for any help...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS sgid directory interoperability with Linux

2009-10-13 Thread Paul B. Henson
On Tue, 13 Oct 2009 casper@sun.com wrote:

 If you look at the code in ufs and zfs, you'll see that they both create
 the mode correctly and the same code is used through NFS.

 There's another scenario: the Linux client updates the attributes after
 creating the file/directory/

I don't think that is the case. My colleague Brian captured the network
traffic and analyzed it, and if I understood him correctly the Linux client
issues the mkdir op with no group specified, which per RFC indicates the
server should set the appropriate group. On the Solaris client, the nfs
mkdir op explicitly specifies the group.

Brian is going to follow up shortly with more technical detail.

Thanks...

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS sgid directory interoperability with Linux

2009-10-13 Thread Paul B. Henson
On Tue, 13 Oct 2009, Joerg Schilling wrote:

 The correct behavior would be to assign the group ownership of the parent
 directory to a new directory (instead of using the current process
 credentials) in case that the sgid bit is set in the parent directory.
 Is this your problem?

Yes, that is exactly our problem -- when a Linux NFSv4 client creates a
directory on a Solaris NFSv4 server when the parent directory has the sgid
bit set and a different group owner then the user's primary group, the new
directory is incorrectly created with the primary group as group owner
rather than the parent directory group.

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Solaris 10 samba in AD mode broken when user in 32 AD groups

2009-10-13 Thread Paul B. Henson

We're currently using the Sun bundled Samba to provide CIFS access to our
ZFS user/group directories.

I found a bug in active directory integration mode, where if a user is in
more than 32 active directory groups, samba calls setgroups with a group
list of greater than 32, which fails, resulting in the user having
absolutely no group privileges beyond their primary group.

I opened a Sun service request, #71547904, to try and get this resolved.
When I initially opened it, I did not know what the underlying problem was.
However, I wasn't making any progress through Sun tech support, so I ended
up installing the Sun samba source code package and diagnosing the problem
myself. In addition, I provided Sun technical report with a simple two line
patch that fixes the problem.

Unfortunately, I am getting the complete run around on this issue and after
almost 2 months have been unable to get the problem fixed.

They keep telling me that support for more than 32 groups in Solaris is not
a bug, but rather an RFE. I completely agree -- I'm not asking for Solaris
to support more than 32 groups (although, as an aside, it sure would be
nice if it did -- 32 is pretty small nowadays; I doubt this will get fixed
in Solaris 10, but anyone have any idea about possible progress on that in
openSolaris?); all I'm asking is that samba be fixed so the user at
least gets the first 32 groups they are in rather than none at all. That is
the behavior of a local login or over NFS, the effective group privileges
are that of the first 32 groups.

Evidently the samba engineering group is in Prague. I don't know if it is a
language problem, or where the confusion is coming from, but even after
escalating this through our regional support manager, they are still
refusing to fix this bug and claiming it is an RFE.

I think based on the information I provided it should be blindingly obvious
that this is a bug, with a fairly trivial fix. I'm pretty sure if they had
just fixed it rather than spent all this time arguing about it would
have taken less time and resources than they've already wasted 8-/.

While not directly a ZFS problem, I was hoping one of the many intelligent
and skilled Sun engineers that hang out on this mailing list :) might do me
a big favor, look at SR#71547904, confirm that it is actually a bug, and
use their internal contacts to somehow convince the samba sustaining
engineering group to fix it? Please?

Thanks much...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 10 samba in AD mode broken when user in 32 AD groups

2009-10-13 Thread Paul B. Henson
On Tue, 13 Oct 2009 casper@sun.com wrote:

 So why not the built-in CIFS support in OpenSolaris?  Probably has a
 similar issue, but still.

I wouldn't think it has this same issue; presumably it won't support more
than the kernel limit of 32 groups, but I can't imagine that in the case
when a user is in more than 32 active directory groups it would simply
discard all group membership :(. I haven't tested it, but I would guess it
would behave like the underlying operating system and simply truncate the
group list at 32, with the user losing any additional privileges granted by
the rest of the groups.

I definitely have my eye on transitioning to OpenSolaris, hopefully
sometime in mid to late next year. Unfortunately, OpenSolaris wasn't quite
enterprise ready when we went into production with this system, and while I
think by now it's pretty close if not there, it's going to take some time
to put together a prototype, sell management on it, and migrate production
services.

 That's not nice and that should be fixed even when the OS doesn't support
 more than 32 bits.  How many groups do you want?

All of them :). I think currently the most groups any single user is in is
about 100. 64 would probably cover everyone except a handful of users.
Linux currently supports a maximum of 65536 groups per user, while I won't
make the mistake of saying no one would ever need more than that ;), I
don't think we would exceed that any time soon.

 I'm actually working on fixing this in OpenSolaris and we may even
 backport this to S10.

Really? Cool. Any timeline on getting it into a development build? What's
the current maximum number of groups you're working towards? Better group
support would be another bullet point for transitioning to openSolaris.

Regarding Solaris 10, my understanding was that the current 32 group limit
could only be changed by modifying internal kernel structures that would
break backwards compatibility, which wouldn't happen because Solaris
guarantees backwards binary compatibility. I could most definitely be
mistaken though.

 What's the bug number?

There is no bug number :(, as they refuse to classify it as a bug -- they
keep insisting it is an RFE, and pointing towards the existing RFE #'s for
increasing the number of groups supported by Solaris.

The service request is #71547904, although now that I think about it they
haven't been keeping the ticket updated. I'll send you a copy of the thread
I've had with the support engineers directly.

Here's the patch I submitted. It adds three lines, one of which is blank
8-/. I'm just really confused why they'd rather spend months arguing it
isn't a bug rather than just spending five minutes applying this simple
patch sigh. I'd just run the version I compiled locally, but it's fairly
clear that the source code provided is not the same as the source code used
to generate the production binary, so I'd really prefer an official fix.


r...@niblet /usr/sfw/src/samba/source/auth # diff -u auth_util.c.orig 
auth_util.c
--- auth_util.c.origFri Sep 11 16:18:46 2009
+++ auth_util.c Fri Sep 11 16:25:56 2009
@@ -1042,6 +1042,7 @@
TALLOC_CTX *mem_ctx;
NTSTATUS status;
size_t i;
+   int ngroups_max = groups_max();


mem_ctx = talloc_new(NULL);
@@ -1099,6 +1100,8 @@
}
add_gid_to_array_unique(server_info, gid,
server_info-groups,
server_info-n_groups);
+
+   if (server_info-n_groups == ngroups_max) break;
}

debug_nt_user_token(DBGC_AUTH, 10, server_info-ptok);



-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 10 samba in AD mode broken when user in 32 AD groups

2009-10-13 Thread Paul B. Henson
On Tue, 13 Oct 2009 casper@sun.com wrote:

 That's not entirely true; the issue is similar having more than 16 groups
 as it breaks AUTH_SYS over-the-wire authentication but we already have
 that now.
[...]
 For now, we're aiming for 1024 groups but also make sure that the
 userland will work without any dependencies.

Good to know; I'm definitely looking forward to this. 1024 will hopefully
suffice for at least a while :).

 The change request, then.  It must have a bug id.

The only number I have unique to my request is the SR #. There has been no
bug opened, and as I mentioned they are referring to an existing RFE
regarding increasing the maximum number of groups supported by the
operating system (these references are in the thread I forwarded you
directly) which is simply not relevant. In fact, it appears my service
request has been marked as canceled without my knowledge, leaving pretty
much no official trail of my request :(.

 Well, I can understand the sense of that.  (Not for OpenSolaris, but for
 S10)  A backport cost a bit so perhaps that's what they want to avoid.

I can't see the cost of applying a three line patch as being particularly
high, but I guess there is some inherent cost in quality control, testing,
and packaging a patch. But upstream just released some security fixes for
the 3.0.x branch, which hopefully they're going to incorporate and release
in a patch, and the incremental cost of adding in my simple fix must be
negligible.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 10 samba in AD mode broken when user in 32 AD groups

2009-10-13 Thread Paul B. Henson
On Tue, 13 Oct 2009, Drew Balfour wrote:

 Ah. No. If you're using idmap and are mapping to an AD server, the
 windows SIDs (which are both users and groups) are stored in a cred
 struct (in cr_ksid) which allows more than 32 groups, up to 64k iirc.

Ah, yes, I neglected to consider that given the CIFS server in OpenSolaris
runs in-kernel it's not subject to the same OS limitations as a user level
process. Once Casper finishes his work and access via NFS is no longer
limited to 32 groups that will be quite sweet...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] NFS sgid directory interoperability with Linux

2009-10-12 Thread Paul B. Henson

We're running Solaris 10 with ZFS to provide home and group directory file
space over NFSv4. We've run into an interoperability issue between the
Solaris NFS server and the Linux NFS client regarding the sgid bit on
directories and assigning appropriate group ownership on newly created
subdirectories.

If a directory exists with the sgid bit set owned by a group other than
the user's primary group, new directories created in that directory are
owned by the primary group rather than by the group of the parent
directory.

Evidently, the Solaris NFS server assumes the client will specify the
correct owner of the directory, whereas the Linux NFS client assumes the
server is in charge of implementing the sgid functionality and will assign
the right group itself. As such, with a Solaris server and a Linux client
the functionality is simply broken :(.

This poses a considerable security issue, as the GROUP@ inherited ACL now
provides access to the primary group of the user rather than the intended
group, which as you might imagine is somewhat problematic.

Ideally, it seems that the server should be responsible for this, rather
than the client voluntarily enforcing it. Is this functionality strictly
defined anywhere, or is it implementation dependent? You'd think
something like this would have turned up in an interoperability bake-off at
some point.

Thanks for any information...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS sgid directory interoperability with Linux

2009-10-12 Thread Paul B. Henson
On Mon, 12 Oct 2009, Mark Shellenbaum wrote:

 Does it only fail under NFS or does it only fail when inheriting an ACL?

It only fails over NFS from a Linux client, locally it works fine, and from
a Solaris client it works fine. It also only seems to fail on directories,
files receive the correct group ownership:

$ uname -a
Linux damien 2.6.27-gentoo-r8 #7 SMP Tue May 26 13:15:08 PDT 2009 x86_64
Dual Core AMD Opteron(tm) Processor 280 AuthenticAMD GNU/Linux

$ id
uid=1005(henson) gid=1012(csupomona)

$ mount | grep henson
kyle.unx.csupomona.edu:/export/user/henson on /user/henson type nfs4
(rw,sec=krb5p,clientaddr=134.71.247.8,sloppy,addr=134.71.247.14)

$ ls -ld .
drwx--s--x 3 henson iit 4 Oct 12 15:58 .

$ touch foo
$ mkdir bar
$ ls -l

total 1
drwxr-sr-x 2 henson csupomona 2 Oct 12 15:58 bar
-rw-r--r-- 1 henson iit   0 Oct 12 15:58 foo

New directory group ownership is wrong whether the containing directory has
an inheritable ACL or not.

I only have ZFS filesystems exported right now, but I assume it would
behave the same for ufs. The underlying issue seems to be the Sun NFS
server expects the NFS client to apply the sgid bit itself and create the
new directory with the parent directory's group, while the Linux NFS client
expects the server to enforce the sgid bit.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intel X25-E SSD in x4500 followup

2009-09-15 Thread Paul B. Henson

b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

   100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

   110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

   120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

   130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

   140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

   150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

   160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

   170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

   180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

   190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

   1a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

   1b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

   1c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

   1d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

   1e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

   1f0: 00 00 00 00 05 00 00 00 00 00 00 00 00 00 00 c8


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intel X25-E SSD in x4500 followup

2009-09-15 Thread Paul B. Henson
On Tue, 15 Sep 2009, Eric Schrock wrote:

 I don't have the ATA spec in front of me, but that that looks like pretty
 normal output to me.  Glad to hear they addressed the issue.

Excellent; I reinstalled it in my test x4500, if no other issues show up I
can try to get my proposal to install them in production going again;
they make a huge difference for common sysadmin operations such as tarball
extraction or code development scenarios like revision control checkouts.
If I'm lucky maybe the ability to import a pool with a dead slog will make
it into U8, that was the only other potential snag in my deployment plan,
as I'd only have one SSD in each system.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intel X25-E SSD in x4500 followup

2009-09-13 Thread Paul B. Henson
On Sat, 12 Sep 2009, Paul B. Henson wrote:

 In any case, I agree with you that the firmware is buggy; however I
 disagree with you as to the outcome of that bug. The drive is not
 returning random garbage, it has *one* byte wrong. Other than that all of
 the data seems ok, at least to my inexpert eyes. smartctl under Linux
 issues a warning about that invalid byte and reports everything else ok.
 Solaris on an x4500 evidentally barfs over that invalid byte and returns
 garbage.

On another note, my understanding is that the official Sun sold
and supported SSD for the x4540 is basically just an OEM'd Intel X25-E. Did
Sun install their own fixed firmware on their version of that drive, or
does it have the same buggy firmware as the street version? It would be
funny if you guys were shipping a drive with buggy firmware that just
happens to work because the x4540 hardware doesn't trip over the one
invalid byte :)...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intel X25-E SSD in x4500 followup

2009-09-13 Thread Paul B. Henson
On Sun, 13 Sep 2009, Mike Gerdts wrote:

 August 11 they released firmware revisions 8820, 8850, and 02G9,
 depending on the drive model.

Ooooh, cool, last time I checked they only had updates for the X25-M.
Thanks for the pointer.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intel X25-E SSD in x4500 followup

2009-09-12 Thread Paul B. Henson
On Fri, 11 Sep 2009, Eric Schrock wrote:

 It's clearly bad firmware - there's no bug in the sata driver.  That
 drive basically returns random data, and if you're unlucky that
 randomness will look like a valid failure response.  In the process I
 found one or two things that could be tightened up with the FMA analysis,
 but when your drive is returning random log data it's impossible to
 actually fix the problem in software.

Well, I won't claim the drive firmware is completely innocent, but as
evidenced in

http://mail.opensolaris.org/pipermail/fm-discuss/2009-June/000436.html

smartctl on a Linux box seems to work just fine. The exact same model drive
also works just fine in an x4540. So I think the assertion that the drive
returns random data is demonstrably false. There's something about the SSD
in an x4500 that just doesn't play nice -- it might be partially the drive
firmware, it might be the SAS controller, it might be something else -- but
it's *not* simply random data being returned from the drive.

It would be really appreciated if that problem could be tracked down so the
drive works as well SMART-wise in an x4500 as it does in a Linux box or an
x4540, but I understand Sun does not certify the x4500 with SSD's so
there's no expectation that would happen. But it would be really really
appreciated :)...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intel X25-E SSD in x4500 followup

2009-09-12 Thread Paul B. Henson
On Sat, 12 Sep 2009, Eric Schrock wrote:

 Also, were you ever able to get this disk behind a SAS transport (X4540,
 J4400, J4500, etc)?  It would be interesting to see how hardware SATL
 deals with this invalid data.  Output from 'smartctl -d sat' and
 'smartctl -d scsi' on such a system would show both the ATA data and the
 translated SCSI data.  My guess is that it just gives up at the first
 invalid version record, something we should probably be doing.

Phil Steinbachs gave you some data from an X25-E in a J4400 attached to an
X4240 via an LSI 1068E based HBA, as well as one in one of the X4240's SAS
slots connected to the internal Adaptec RAID controller:

http://mail.opensolaris.org/pipermail/fm-discuss/2009-June/000432.html

and:

http://mail.opensolaris.org/pipermail/fm-discuss/2009-June/000435.html

Your last email on the subject was:

http://mail.opensolaris.org/pipermail/fm-discuss/2009-June/000447.html

in which you said:

The primary thing is that this drive is completely busted - it's reporting
totally invalid data in response to the ATA READ EXT LOG command for log
0x07 (Extended SMART self-test log).  The spec defines that byte 0 must be
0x1 and that byte 1 is reserved.

Phil might still be in a position to run smartctl on the drives if you're
still interested in the data.

I guess this is why you're now saying the drive is returning invalid data,
I had forgotten the details, that was almost three months ago.

In any case, I agree with you that the firmware is buggy; however I
disagree with you as to the outcome of that bug. The drive is not returning
random garbage, it has *one* byte wrong. Other than that all of the data
seems ok, at least to my inexpert eyes. smartctl under Linux issues a
warning about that invalid byte and reports everything else ok. Solaris on
an x4500 evidentally barfs over that invalid byte and returns garbage.

Overall, I think the Linux approach seems more useful. Be strict in what
you generate, and lenient in what you accept ;), or something like that. As
I already said, it would be really really nice if the Solaris driver could
be fixed to be a little more forgiving and deal better with the drive, but
I've got no expectation that it should be done. But it could be :).

Thanks again for your help. I apologize if I've been a bit antagonistic, I
tend to go dog with a bone when I start debating something.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intel X25-E SSD in x4500 followup

2009-09-11 Thread Paul B. Henson
On Thu, 10 Sep 2009, Alex Li wrote:

 We finally resolved this issue by change LSI driver. For details, please
 refer to here
 http://enginesmith.wordpress.com/2009/08/28/ssd-faults-finally-resolved/

I believe you hijacked my thread ;).

x4500's have Marvell SATA controllers, not LSI. My issue with Intel SSD's
being marked faulty in X4500's has yet to be resolved. The last time I
rebooted it fm started marking the SSD failed again due to invalid
self-check log data. I had some correspondence with Eric Schrock who
indicated it looked like a combination of buggy Intel firmware and a bug in
the Solaris SATL driver, but haven't heard back from him as to whether they
might fix it.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   >