Re: [zfs-discuss] zfs promote and ENOSPC

2008-06-11 Thread Robin Guo
Hi, Mike,

  It's like 6452872, it need enough space for 'zfs promote'

  - Regards,

Mike Gerdts wrote:
 I needed to free up some space to be able to create and populate a new
 upgrade.  I was caught off guard by the amount of free space required
 by zfs promote.

 bash-3.2# uname -a
 SunOS indy2 5.11 snv_86 i86pc i386 i86pc

 bash-3.2# zfs list
 NAME   USED  AVAIL  REFER  MOUNTPOINT
 rpool 5.49G  1.83G55K  /rpool
 [EMAIL PROTECTED] 46.5K  -  49.5K  -
 rpool/ROOT5.39G  1.83G18K  none
 rpool/ROOT/2008.052.68G  1.83G  3.38G  legacy
 rpool/ROOT/2008.05/opt 814M  1.83G  22.3M  legacy
 rpool/ROOT/2008.05/[EMAIL PROTECTED]43K  -  22.3M  -
 rpool/ROOT/2008.05/opt/SUNWspro739M  1.83G   739M  legacy
 rpool/ROOT/2008.05/opt/netbeans   52.9M  1.83G  52.9M  legacy
 rpool/ROOT/preview2   2.71G  1.83G  2.71G  /mnt
 rpool/ROOT/[EMAIL PROTECTED] 6.13M  -  2.71G  -
 rpool/ROOT/preview2/opt 27K  1.83G  22.3M  legacy
 rpool/export  89.8M  1.83G19K  /export
 rpool/export/home 89.8M  1.83G  89.8M  /export/home

 bash-3.2# zfs promote rpool/ROOT/2008.05
 cannot promote 'rpool/ROOT/2008.05': out of space

 Notice that I have 1.83 GB of free space and the snapshot from which
 the clone was created (rpool/ROOT/[EMAIL PROTECTED]) is 2.71 GB.  It
 was not until I had more than 2.71 GB of free space that I could
 promote rpool/ROOT/2008.05.

 This behavior does not seem to be documented.  Is it a bug in the
 documentation or zfs?

   


-- 
Regards,

Robin Guo, Xue-Bin Guo
Solaris Kernel and Data Service QE,
Sun China Engineering and Reserch Institute
Phone: +86 10 82618200 +82296
Email: [EMAIL PROTECTED]
Blog: http://blogs.sun.com/robinguo

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SMC Webconsole 3.1 and ZFS Administration 1.0 - stacktraces in snv_b89

2008-06-11 Thread Jim Klimov
Likewise. Just plain doesn't work.

Not required though, since the command-line is okay and way powerful ;)

And there are some more interesting challenges to work on, so I didn't push 
this problem any more yet.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS conflict with MAID?

2008-06-11 Thread Tobias Exner


Richard Elling schrieb:
 Tobias Exner wrote:
 Hi John,

 I've done some tests with a SUN X4500 with zfs and MAID using the 
 powerd of Solaris 10 to power down the disks which weren't access for 
 a configured time. It's working fine...

 The only thing I run into was the problem that it took roundabout a 
 minute to power on 4 disks in a zfs-pool. The problem seems to be 
 that the powerd starts the disks sequentially.

 Did you power down disks or spin down disks?  It is relatively
 easy to spin down (or up) disks with luxadm stop (start).  If a
 disk is accessed, then it will spin itself up.  By default, the timeout
 for disk response is 60 seconds, and most disks can spin up in
 less than 60 seconds.
luxadm is not very helpful when I want to have a automatic MAID-solution.

The powerd of Solaris just spin down automatically the disks and the 
powerconsumption falls below 1 watts. ( 3,5)
My tests show me that it will take roundabout 20 seconds to power up one 
single disk and to get access.

Actually I don't know why it takes 55 seconds to spin up 4 disks in a 
zfs-pool, but that are my results..



 I tried to open a RFE... but until now without success.


 Perhaps because disks will spin up when an access is requested,
 so to solve your problem you'd have to make sure that all of
 a set of disks are accessed when any in the set are accessed --
 butugly.
As I know, when I'm using a zfs-pool I have no possibilities to change 
the behavior which disk will be accessed when I just try to read or write.
Do you know more?


 NB. back when I had a largish pile of smallish disks hanging
 off my workstation for testing, a simple cron job running
 luxadm stop helped my energy bill :-)
 -- richard



regards,

Tobias
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SSD reliability, wear levelling, warranty period

2008-06-11 Thread Al Hopper
I've been reading, with great (personal/professional) interest about
Sun getting very serious about SSD-equipping servers as a standard
feature in the 2nd half of this year.  Yeah!  Excellent news - and
it's nice to see Sun lead, rather than trail the market!  Those of us,
who are ZFS zealots, know the value of a ZFS log, and/or ZFS cache
device and how these devices can (very positively) impact the
performance of a ZFS raid configuration built on cost effective SATA
disk drives.  But - based on personal observation - there is a lot of
hype surrounding SSD reliability.  Obviously the *promise* of this
technology is higher performance and *reliability* with lower power
requirements due to no (mechanical) moving parts.  But... if you look
broadly at the current SSD product offerings, you see: a) lower than
expected performance - particularly in regard to write IOPS (I/O Ops
per Second) and b) warranty periods that are typically 1 year - with
the (currently rare) exception of products that are offered with a 5
year warranty.

Obviously, for SSD products to live up to the current marketing hype,
they need to deliver superior performance and *reliability*.
Everyone I know *wants* one or more SSD devices - but they also have
the expectation that those devices will come with a warranty at least
equivalent to current hard disk drives (3 or 5 years).

So ... I'm interested in learning from anyone on this list, and, in
particular, from Team ZFS, what the reality is regarding SSD
reliability.  Obviously Sun employees are not going to compromise
their employment and divulge upcoming product specific data - but
there must be *some* data (white papers etc) in the public domain that
would provide some relevant technical data??

Regards,

-- 
Al Hopper  Logical Approach Inc,Plano,TX [EMAIL PROTECTED]
   Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD reliability, wear levelling, warranty period

2008-06-11 Thread Tobias Exner
Hi Al,

Sorry, but leading the market is not right at this point.

www.superssd.com has the answer to all those questions about SSD and 
reliability/speed for many years..

But I'm with you. I'm looking forward the coming products of SUN 
concerning SSD..


btw: it's seems to me that this thread is a little bit OT.

regards,

Tobias Exner



Al Hopper schrieb:
 I've been reading, with great (personal/professional) interest about
 Sun getting very serious about SSD-equipping servers as a standard
 feature in the 2nd half of this year.  Yeah!  Excellent news - and
 it's nice to see Sun lead, rather than trail the market!  Those of us,
 who are ZFS zealots, know the value of a ZFS log, and/or ZFS cache
 device and how these devices can (very positively) impact the
 performance of a ZFS raid configuration built on cost effective SATA
 disk drives.  But - based on personal observation - there is a lot of
 hype surrounding SSD reliability.  Obviously the *promise* of this
 technology is higher performance and *reliability* with lower power
 requirements due to no (mechanical) moving parts.  But... if you look
 broadly at the current SSD product offerings, you see: a) lower than
 expected performance - particularly in regard to write IOPS (I/O Ops
 per Second) and b) warranty periods that are typically 1 year - with
 the (currently rare) exception of products that are offered with a 5
 year warranty.

 Obviously, for SSD products to live up to the current marketing hype,
 they need to deliver superior performance and *reliability*.
 Everyone I know *wants* one or more SSD devices - but they also have
 the expectation that those devices will come with a warranty at least
 equivalent to current hard disk drives (3 or 5 years).

 So ... I'm interested in learning from anyone on this list, and, in
 particular, from Team ZFS, what the reality is regarding SSD
 reliability.  Obviously Sun employees are not going to compromise
 their employment and divulge upcoming product specific data - but
 there must be *some* data (white papers etc) in the public domain that
 would provide some relevant technical data??

 Regards,

   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD reliability, wear levelling, warranty period

2008-06-11 Thread Al Hopper
On Wed, Jun 11, 2008 at 3:59 AM, Tobias Exner [EMAIL PROTECTED] wrote:
 Hi Al,

 Sorry, but leading the market is not right at this point.

 www.superssd.com has the answer to all those questions about SSD and
 reliability/speed for many years..

 But I'm with you. I'm looking forward the coming products of SUN concerning
 SSD..


 btw: it's seems to me that this thread is a little bit OT.

I don't think its OT - because SSDs make perfect sense as ZFS log
and/or cache devices.  If I did not make that clear in my OP then I
failed to communicate clearly.  In both these roles (log/cache)
reliability is of the utmost importance.

Thanks for the link - I'll take a look/see.

 regards,

 Tobias Exner



 Al Hopper schrieb:

 I've been reading, with great (personal/professional) interest about
 Sun getting very serious about SSD-equipping servers as a standard
 feature in the 2nd half of this year.  Yeah!  Excellent news - and
 it's nice to see Sun lead, rather than trail the market!  Those of us,
 who are ZFS zealots, know the value of a ZFS log, and/or ZFS cache
 device and how these devices can (very positively) impact the
 performance of a ZFS raid configuration built on cost effective SATA
 disk drives.  But - based on personal observation - there is a lot of
 hype surrounding SSD reliability.  Obviously the *promise* of this
 technology is higher performance and *reliability* with lower power
 requirements due to no (mechanical) moving parts.  But... if you look
 broadly at the current SSD product offerings, you see: a) lower than
 expected performance - particularly in regard to write IOPS (I/O Ops
 per Second) and b) warranty periods that are typically 1 year - with
 the (currently rare) exception of products that are offered with a 5
 year warranty.

 Obviously, for SSD products to live up to the current marketing hype,
 they need to deliver superior performance and *reliability*.
 Everyone I know *wants* one or more SSD devices - but they also have
 the expectation that those devices will come with a warranty at least
 equivalent to current hard disk drives (3 or 5 years).

 So ... I'm interested in learning from anyone on this list, and, in
 particular, from Team ZFS, what the reality is regarding SSD
 reliability.  Obviously Sun employees are not going to compromise
 their employment and divulge upcoming product specific data - but
 there must be *some* data (white papers etc) in the public domain that
 would provide some relevant technical data??

 Regards,




Regards,

-- 
Al Hopper  Logical Approach Inc,Plano,TX [EMAIL PROTECTED]
   Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD reliability, wear levelling, warranty period

2008-06-11 Thread Adam Leventhal
On Jun 11, 2008, at 1:16 AM, Al Hopper wrote:
 But... if you look
 broadly at the current SSD product offerings, you see: a) lower than
 expected performance - particularly in regard to write IOPS (I/O Ops
 per Second)

True. Flash is quite asymmetric in its performance characteristics.
That said, the L2ARC has been specially designed to play well with the
natural strengths and weaknesses of flash.

 and b) warranty periods that are typically 1 year - with
 the (currently rare) exception of products that are offered with a 5
 year warranty.

You'll see a new class of SSDs -- eSSDs -- designed for the enterprise
with longer warranties and more write/erase cycles. Further, ZFS will
do its part by not killing the write/erase cycles of the L2ARC by
constantly streaming as fast as possible. You should see lifetimes in
the 3-5 year range on typical flash.

 Obviously, for SSD products to live up to the current marketing hype,
 they need to deliver superior performance and *reliability*.
 Everyone I know *wants* one or more SSD devices - but they also have
 the expectation that those devices will come with a warranty at least
 equivalent to current hard disk drives (3 or 5 years).

I don't disagree entirely, but as a cache device flash actually can be
fairly unreliable and we'll pick it up in ZFS.

 So ... I'm interested in learning from anyone on this list, and, in
 particular, from Team ZFS, what the reality is regarding SSD
 reliability.  Obviously Sun employees are not going to compromise
 their employment and divulge upcoming product specific data - but
 there must be *some* data (white papers etc) in the public domain that
 would provide some relevant technical data??


A typical high-end SSD can sustain 100k write/erase cycles so you can
do some simple math to see that a 128GB device written to at a rate of
150M/s will last nearly 3 years. Again, note that unreliable devices
will result in a performance degradation when you fail a checksum in
the L2ARC, but the data will still be valid out of the main storage
pool.

You're going to see much more on this in the next few months. I made a
post to my blog that probably won't answer your questions directly, but
may help inform you about what we have in mind.

   http://blogs.sun.com/ahl/entry/flash_hybrid_pools_and_future

Adam

--
Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD reliability, wear levelling, warranty period

2008-06-11 Thread Tobias Exner




The reliability of flash increasing alot if "wear leveling" is
implemented and there's the capability to build a raid over a couple of
flash-modules ( maybe automatically by the controller ).
And if there are RAM-modules as a cache infront of the flash the most
problems will be solved regarding fast read- and write-access.

I'm very interested what kind of data security will be implemented by
SUN in future. I was not able to find any technical information until
now.

@ Adam
I never heared about "eSSD". Do you have more information about this?
google and me cannot find anything.


regards,

Tobias





Adam Leventhal schrieb:

  On Jun 11, 2008, at 1:16 AM, Al Hopper wrote:
  
  
But... if you look
broadly at the current SSD product offerings, you see: a) lower than
expected performance - particularly in regard to write IOPS (I/O Ops
per Second)

  
  
True. Flash is quite asymmetric in its performance characteristics.
That said, the L2ARC has been specially designed to play well with the
natural strengths and weaknesses of flash.

  
  
and b) warranty periods that are typically 1 year - with
the (currently rare) exception of products that are offered with a 5
year warranty.

  
  
You'll see a new class of SSDs -- eSSDs -- designed for the enterprise
with longer warranties and more write/erase cycles. Further, ZFS will
do its part by not killing the write/erase cycles of the L2ARC by
constantly streaming as fast as possible. You should see lifetimes in
the 3-5 year range on typical flash.

  
  
Obviously, for SSD products to live up to the current marketing hype,
they need to deliver superior performance and *reliability*.
Everyone I know *wants* one or more SSD devices - but they also have
the expectation that those devices will come with a warranty at least
equivalent to current hard disk drives (3 or 5 years).

  
  
I don't disagree entirely, but as a cache device flash actually can be
fairly unreliable and we'll pick it up in ZFS.

  
  
So ... I'm interested in learning from anyone on this list, and, in
particular, from Team ZFS, what the reality is regarding SSD
reliability.  Obviously Sun employees are not going to compromise
their employment and divulge upcoming product specific data - but
there must be *some* data (white papers etc) in the public domain that
would provide some relevant technical data??

  
  

A typical high-end SSD can sustain 100k write/erase cycles so you can
do some simple math to see that a 128GB device written to at a rate of
150M/s will last nearly 3 years. Again, note that unreliable devices
will result in a performance degradation when you fail a checksum in
the L2ARC, but the data will still be valid out of the main storage
pool.

You're going to see much more on this in the next few months. I made a
post to my blog that probably won't answer your questions directly, but
may help inform you about what we have in mind.

   http://blogs.sun.com/ahl/entry/flash_hybrid_pools_and_future

Adam

--
Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD reliability, wear levelling, warranty period

2008-06-11 Thread Darren J Moffat
Tobias Exner wrote:
 The reliability of flash increasing alot if wear leveling is 
 implemented and there's the capability to build a raid over a couple of 
 flash-modules ( maybe automatically by the controller ).
 And if there are RAM-modules as a cache infront of the flash the most 
 problems will be solved regarding fast read- and write-access.
 
 I'm very interested what kind of data security will be implemented by 
 SUN in future. I was not able to find any technical information until now.

If by data security you mean encrypting the data then see this project:

http://opensolaris.org/os/project/zfs-crypto/

If you don't mean encrypting the data in the filesystem then what do you 
mean by that term ?

Note that Sun also has tape encryption products as well.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Filesystem for each home dir - 10,000 users?

2008-06-11 Thread Richard L. Hamilton
 On Sat, 7 Jun 2008, Mattias Pantzare wrote:
 
  If I need to count useage I can use du. But if you
 can implement space
  usage info on a per-uid basis you are not far from
 quota per uid...
 
 That sounds like quite a challenge.  UIDs are just
 numbers and new 
 ones can appear at any time.  Files with existing
 UIDs can have their 
 UIDs switched from one to another at any time.  The
 space used per UID 
 needs to be tallied continuously and needs to track
 every change, 
 including real-time file growth and truncation.  We
 are ultimately 
 talking about 128 bit counters here.  Instead of
 having one counter 
 per filesystem we now have potentially hundreds of
 thousands, which 
 represents substantial memory.

But if you already have the ZAP code, you ought to be able to do
quick lookups of arbitrary byte sequences, right?  Just assume that
a value not stored is zero (or infinity, or uninitialized, as applicable),
and you have the same functionality as  the sparse quota file on ufs,
without the problems.

Besides, uid/gid/sid quotas would usually make more sense at the zpool level 
than
at the individual filesystem level, so perhaps it's not _that_ bad.  Which is to
say, you want user X to have an n GB quota over the whole zpool, and you
probably don't so much care whether the filesystem within the zpool
corresponds to his home directory or to some shared directory.

 Multicore systems have the additional challenge that
 this complex 
 information needs to be effectively shared between
 cores.  Imagine if 
 you have 512 CPU cores, all of which are running some
 of the ZFS code 
 and have their own caches which become invalidated
 whenever one of 
 those counters is updated.  This sounds like a no-go
 for an almost 
 infinite-sized pooled last word filesystem like
 ZFS.
 
 ZFS is already quite lazy at evaluating space
 consumption.  With ZFS, 
 'du' does not always reflect true usage since updates
 are delayed.

Whatever mechanism can check at block allocation/deallocation time
to keep track of per-filesystem space (vs a filesystem quota, if there is one)
could surely also do something similar against per-uid/gid/sid quotas.  I 
suspect
a lot of existing functions and data structures could be reused or adapted for
most of it.  Just one more piece of metadata to update, right?  Not as if ufs
quotas had zero runtime penalty if enabled.   And you only need counters and
quotas in-core for identifiers applicable to in-core znodes, not for every
identifier used on the zpool.

Maybe I'm off base on the details.  But in any event, I expect that it's 
entirely
possible to make it happen, scalably.  Just a question of whether it's worth the
cost of designing, coding, testing, documenting.  I suspect there may be enough
scenarios for sites with really high numbers of accounts (particularly
universities, which are not only customers in their own right, but a chance
for future mindshare) that it might be worthwhile, but I don't know that to
be the case.

IMO, even if no one sort of site using existing deployment architectures would
justify it, given the future blurring of server, SAN, and NAS (think recent
SSD announcement + COMSTAR + iSCSI initiator + separate device for zfs
zil  cache + in-kernel CIFS + enterprise authentication with Windows
interoperability + Thumper + ...), the ability to manage all that storage in all
sorts of as-yet unforseen deployment configurations _by user or other identity_
may well be important across a broad base of customers.  Maybe identity-based,
as well as filesystem-based quotas, should be part of that.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD reliability, wear levelling, warranty period

2008-06-11 Thread Richard L. Hamilton
  btw: it's seems to me that this thread is a little
 bit OT.
 
 I don't think its OT - because SSDs make perfect
 sense as ZFS log
 and/or cache devices.  If I did not make that clear
 in my OP then I
 failed to communicate clearly.  In both these roles
 (log/cache)
 reliability is of the utmost importance.

Older SSDs (before cheap and relatively high-cycle-limit flash)
were RAM cache+battery+hard disk.  Surely RAM+battery+flash
is also possible; the battery only needs to keep the RAM alive long
enough to stage to the flash.  That keeps the write count on the flash
down, and the speed up (RAM being faster than flash).  Such a device
would of course cost more, and be less dense (given having to have
battery+charging circuits and RAM as well as flash), than a pure flash device.
But with more limited write rates needed, and no moving parts, _provided_
it has full ECC and maybe radiation-hardened flash (if that exists), I can't
imagine why such a device couldn't be exceedingly reliable and have quite
a long lifetime (with the battery, hopefully replaceable, being more of
a limitation than the flash).

It could be a matter of paying for how much quality you want...

As for reliability, from zpool(1m):

log

A separate intent log device. If more than one log device is specified, 
 then
  writes are load-balanced between devices. Log devices can be mirrored.
  However, raidz and raidz2 are not supported for the intent log. For more
  information, see the “Intent Log” section.

 cache

A device used to cache storage pool data. A cache device cannot be mirrored
 or part of a raidz or raidz2 configuration. For more information, see the
 “Cache Devices” section.
[...]
 Cache Devices

  Devices can be added to a storage pool as “cache devices.” These devices
 provide an additional layer of caching between main memory and disk. For
 read-heavy workloads, where the working set size is much larger than what can
 be cached in main memory, using cache devices allow much more of this
 working set to be served from low latency media. Using cache devices provides
 the greatest performance improvement for random read-workloads of mostly
  static content.

   To create a pool with cache devices, specify a “cache” vdev with any number
 of devices. For example:

  # zpool create pool c0d0 c1d0 cache c2d0 c3d0

Cache devices cannot be mirrored or part of a raidz configuration. If a 
 read
  error is encountered on a cache device, that read I/O is reissued to the
  original storage pool device, which might be part of a mirrored or raidz
  configuration.

  The content of the cache devices is considered volatile, as is the case with
 other system caches.

That tells me that the zil can be mirrored and zfs can recover from cache 
errors.

I think that means that these devices don't need to be any more reliable than
regular disks, just much faster.

So...expensive ultra-reliability SSD, or much less expensive SSD plus mirrored
zil?  Given what zfs can do with cheap SATA, my bet is on the latter...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Cruft left after update

2008-06-11 Thread Yiannis
Hi

after updating to svn_90 (several retries before I patched pkg) I was left with 
the following

NAME USED  AVAIL  REFER 
 MOUNTPOINT
rpool   9.87G  24.6G62K 
 /rpool
[EMAIL PROTECTED]   19.5K  -
55K  -
rpool/ROOT  7.96G  24.6G18K 
 /rpool/ROOT
rpool/[EMAIL PROTECTED]15K  -   
 18K  -
rpool/ROOT/opensolaris  55.7M  24.6G  2.95G 
 legacy
rpool/ROOT/opensolaris-10   7.91G  24.6G  4.44G 
 legacy
rpool/ROOT/[EMAIL PROTECTED]   8.56M  -  2.22G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-01-07:14:04  4.26M  -  2.36G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-01-08:08:11  5.35M  -  2.95G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-07-20:01:54  97.5K  -  3.41G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-00:28:3756K  -  3.41G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-00:38:34   120K  -  3.41G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-00:56:4676K  -  3.41G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-01:06:44   121K  -  3.41G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-06:00:33  3.51M  -  3.41G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-06:39:31  1.62M  -  3.41G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-07:38:2374K  -  3.41G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-07:55:1559K  -  3.41G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-08:47:2249K  -  3.41G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-09:47:45  2.21M  -  3.41G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-10:33:50  2.88M  -  3.41G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-13:18:02  2.86M  -  3.41G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-14:16:02  1.98M  -  3.41G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-15:11:06   967K  -  3.41G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-15:23:41  1.01M  -  3.41G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-16:43:01   925K  -  3.41G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-17:05:32   925K  -  3.41G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-20:10:03  5.05M  -  3.41G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-23:04:19  6.47M  -  4.15G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-09-12:12:05   238K  -  4.15G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-09-12:58:23   160K  -  4.15G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-09-13:35:4128K  -  4.15G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-09-14:33:15   224K  -  4.15G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-09-15:13:1082K  -  4.15G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-09-15:32:3198K  -  4.15G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-09-15:44:0482K  -  4.15G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-09-16:11:51   108K  -  4.15G  -
rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-09-19:25:12  12.8M  -  4.15G  -
rpool/ROOT/opensolaris-10/opt   1.76G  24.6G  1.76G 
 /opt
rpool/ROOT/opensolaris-10/[EMAIL PROTECTED] 72K  -  
3.61M  -
rpool/ROOT/opensolaris-10/[EMAIL PROTECTED]:-:2008-06-01-07:14:0439K  - 
  595M  -
rpool/ROOT/opensolaris-10/[EMAIL PROTECTED]:-:2008-06-01-08:08:1148K  - 
  622M  -
rpool/ROOT/opensolaris-10/[EMAIL PROTECTED]:-:2008-06-08-00:28:37   510K  - 
 1.76G  -
rpool/ROOT/opensolaris-10/[EMAIL PROTECTED]:-:2008-06-08-23:04:19   177K  - 
 1.76G  -
rpool/ROOT/opensolaris-10/[EMAIL PROTECTED]:-:2008-06-09-14:33:15   118K  - 
 1.76G  -
rpool/ROOT/opensolaris-10/[EMAIL PROTECTED]:-:2008-06-09-19:25:12   161K  - 
 1.76G  -
rpool/ROOT/opensolaris/opt  0  24.6G   622M 
 /opt
rpool/export1.90G  24.6G19K 
 /export
rpool/[EMAIL PROTECTED]  15K  -
19K  -
rpool/export/home   1.90G  24.6G  1.90G 
 /export/home
rpool/export/[EMAIL PROTECTED] 18K  -   
 21K  -

Which one of these can I clear?

Thanks
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Growing root pool ?

2008-06-11 Thread Richard L. Hamilton
 On Tue, Jun 10, 2008 at 11:33:36AM -0700, Wyllys
 Ingersoll wrote:
  Im running build 91 with ZFS boot.  It seems that
 ZFS will not allow
  me to add an additional partition to the current
 root/boot pool
  because it is a bootable dataset.  Is this a known
 issue that will be
  fixed or a permanent limitation?
 
 The current limitation is that a bootable pool be
 limited to one disk or
 one disk and a mirror.  When your data is striped
 across multiple disks,
 that makes booting harder.
 
 From a post to zfs-discuss about two months ago:
 
 ... we do have plans to support booting from
  RAID-Z.  The design is
 still being worked out, but it's likely that it
  will involve a new
 kind of dataset which is replicated on each disk of
  the RAID-Z pool,
 and which contains the boot archive and other
  crucial files that the
 booter needs to read.  I don't have a projected
  date for when it will
 be available.  It's a lower priority project than
  getting the install
   support for zfs boot done.
 - 
 Darren

If I read you right, with little or nothing extra, that would enable
growing rpool as well, since what it would really do is ensure
/boot (and whatever if anything else) was mirrored even though
the rest of the zpool was raidz or raidz2; which would also
ensure that those critical items were _not_ spread across the
stripe that would result from adding devices to an existing zpool.

Of course installation and upgrade would have to be able to recognize
and deal with such exotica too.  Which seems to pose a problem, since
having one dataset in the zpool mirrored while the rest is raidz and/or
extended by a stripe implies to me that some space is more or less
reserved for that purpose, or that such a dataset couldn't be snapshotted,
or both; so I suppose there might be a smaller-than-total-capacity limit
on the number of BEs possible.

http://en.wikipedia.org/wiki/TANSTAAFL ...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Growing root pool ?

2008-06-11 Thread Wyllys Ingersoll
I'm not even trying to stripe it across multiple disks, I just want to add 
another partition (from the same physical disk) to the root pool.  Perhaps that 
is a distinction without a difference, but my goal is to grow my root pool, not 
stripe it across disks or enable raid features (for now).

Currently, my root pool is using c1t0d0s4 and I want to add c1t0d0s0 to the 
pool, but can't.

-Wyllys
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs promote and ENOSPC (+panic with dtrace)

2008-06-11 Thread Mike Gerdts
On Wed, Jun 11, 2008 at 12:58 AM, Robin Guo [EMAIL PROTECTED] wrote:
 Hi, Mike,

  It's like 6452872, it need enough space for 'zfs promote'

Not really -  in 6452872 a file system is at its quota before the
promote is issued. I expect that a promote may cause several KB of
metadata changes that require some space and as such would require
more space than the quota.

In my case, quotas are not in used.  I had over 1.8 GB free before I
issued the zfs promote and fully expected to have roughly the same
amount of space free after the promote.  It seems as though a wrong
comparison about the amount of required free space is being made.

I have been able to reproduce - but then when I started poking at it
with dtrace (no destructive actions) I got a panic.

# mdb *.0
Loading modules: [ unix genunix specfs dtrace cpu.generic uppc
scsi_vhci zfs random ip hook neti sctp arp usba fctl md lofs sppp
crypto ptm ipc fcp fcip cpc logindmux sv nsctl sdbc ufs rdc ii nsmb ]
 ::status
debugging crash dump vmcore.0 (32-bit) from indy2
operating system: 5.11 snv_86 (i86pc)
panic message:
BAD TRAP: type=e (#pf Page fault) rp=e0620d38 addr=200 occurred in module unkn
own due to a NULL pointer dereference
dump content: kernel pages only
 ::stack
0x200(eb1ea000)
zfs_ioc_promote+0x3b()
zfsdev_ioctl+0xd8(2d8, 5a23, 8045e40, 13, e8b3a020, e0620f78)
cdev_ioctl+0x2e(2d8, 5a23, 8045e40, 13, e8b3a020, e0620f78)
spec_ioctl+0x65(ddfb6c00, 5a23, 8045e40, 13, e8b3a020, e0620f78)
fop_ioctl+0x49(ddfb6c00, 5a23, 8045e40, 13, e8b3a020, e0620f78)
ioctl+0x155()
sys_call+0x10c()


The dtrace command that I was running was:

dtrace -n 'fbt:zfs:dsl_dataset_promote:return { trace(arg0); stack() }'

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Quota question

2008-06-11 Thread Glaser, David
Hi all, I'm new to the list and I thought I'd start out on the right foot. ZFS 
is great, but I have a couple questions

I have a Try-n-buy x4500 with one large zfs pool with 40 1TB drives in it. The  
pool is named backup.

Of this pool, I have a number of volumes.

backup/clients
backup/clients/bob
backup/clients/daniel
...

Now bob and Daniel are populated by rsync over ssh to synchronize filesystems 
with client machines. (the data will then be written to a SL500) I'd like to 
set the quota on /backup/clients to some arbitrary small amount. Seems pretty 
handy since nothing should go into backup/clients but into the volumes 
backup/clients/*  But when I set the quota on backup/clients, I am unable to 
increase the quota for the sub volumes (bob, Daniel, etc).

Any ideas if this is possible or how to do it?

Thanks
Dave


David Glaser
Systems Administrator
LSA Information Technology
University of Michigan

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Growing root pool ?

2008-06-11 Thread Kyle McDonald
Wyllys Ingersoll wrote:
 I'm not even trying to stripe it across multiple disks, I just want to add 
 another partition (from the same physical disk) to the root pool.  Perhaps 
 that is a distinction without a difference, but my goal is to grow my root 
 pool, not stripe it across disks or enable raid features (for now).

 Currently, my root pool is using c1t0d0s4 and I want to add c1t0d0s0 to the 
 pool, but can't.

   
DANGER: Uncharted territory!!!

That said, if the space on the disk (for the 2 partitions) is contiguous 
(which it doesn't appear is true in your case,) or could be made 
contiguous by moving some other slice out of the way, then one way you 
should (note: I haven't tried this, and there is chance for human error 
to mess things up even if it will work - and there's some chance it 
won't work even if you do it perfect,) be able to grow the root pool by 
deleting the new (second) partition, and redefine the original partition 
to extand across the space of both partitions.

Once that's done, a zpool replace c1t0d0sX c1t0d0sX should notify ZFS 
that the slice is bigger, and it will grow the pool to match.

You have s4 and s0, so I bet the space is not contiguous, and I'd guess 
the free space is earlier on the disk, not later. You might be able to 
get around that by mirroring s4 to s0 first then detaching s4, so that 
you're only using s0 and the beginning of the disk... but that's just 
more changes that could introduce problems.

Needless to say, I wouldn't try this on a system I really needed with out:

1) Really good backups!

and possibly,

2) Trying it out first on a virtual machine, or different HW.

Personally, unless I really wanted to prove I could do it, I'd just 
backup and reinstall. ;) sorry.

   -Kyle

 
 -Wyllys
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Growing root pool ?

2008-06-11 Thread Richard L. Hamilton
 I'm not even trying to stripe it across multiple
 disks, I just want to add another partition (from the
 same physical disk) to the root pool.  Perhaps that
 is a distinction without a difference, but my goal is
 to grow my root pool, not stripe it across disks or
 enable raid features (for now).
 
 Currently, my root pool is using c1t0d0s4 and I want
 to add c1t0d0s0 to the pool, but can't.
 
 -Wyllys

Right, that's how it is right now (which the other guy seemed to
be suggesting might change eventually, but nobody knows when
because it's just not that important compared to other things).

AFAIK, if you could shrink the partition whose data is after
c1t0d0s4 on the disk, you could grow c1t0d0s4 by that much,
and I _think_ zfs would pick up the growth of the device automatically.
(ufs partitions can be grown like that, or by being on an SVM or VxVM
volume that's grown, but then one has to run a command specific to ufs
to grow the filesystem to use the additional space).
I think zpools are supposed to grow automatically if SAN LUNs are grown,
and this should be a similar situation, anyway.  But if you can do that,
and want to try it, just be careful.  And of course you couldn't shrink it 
again, either.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SMC Webconsole 3.1 and ZFS Administration 1.0 - stacktraces in snv_b89

2008-06-11 Thread Rick
Yeah. The command line works fine. Thought it to be a bit curious that there 
was an issue with the HTTP interface. It's low priority I guess because it 
doesn't impact the functionality really.

Thanks for the responses.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SATA controller suggestion

2008-06-11 Thread Lee
If your worried about the bandwidth limitations of putting something like the 
supermicro card in a pci slot how about using an active riser card to convert 
from PCI-E to PCI-X. One of these, or something similar:

http://www.tyan.com/product_accessories_spec.aspx?pid=26

on sale at

http://www.amazon.com/dp/B000OH5J9G?smid=ATVPDKIKX0DERtag=nextag-ce-tier2-20linkCode=asn

I'm sure you can find something similar for less, and I have seen ones that go 
from PCI-E x16 to several PCI-X as well. That and the supermicro are under half 
the price of the cheapest LSI PCI-E card.

Lee
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SATA controller suggestion

2008-06-11 Thread Tim
On Wed, Jun 11, 2008 at 10:18 AM, Lee [EMAIL PROTECTED] wrote:

 If your worried about the bandwidth limitations of putting something like
 the supermicro card in a pci slot how about using an active riser card to
 convert from PCI-E to PCI-X. One of these, or something similar:

 http://www.tyan.com/product_accessories_spec.aspx?pid=26

 on sale at


 http://www.amazon.com/dp/B000OH5J9G?smid=ATVPDKIKX0DERtag=nextag-ce-tier2-20linkCode=asn

 I'm sure you can find something similar for less, and I have seen ones that
 go from PCI-E x16 to several PCI-X as well. That and the supermicro are
 under half the price of the cheapest LSI PCI-E card.

 Lee


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Are those universal though?  I was under the impression it had to be
supported by the motherboard, or you'd fry all components involved.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Filesystem for each home dir - 10,000 users?

2008-06-11 Thread Darren J Moffat
Richard L. Hamilton wrote:
 Whatever mechanism can check at block allocation/deallocation time
 to keep track of per-filesystem space (vs a filesystem quota, if there is one)
 could surely also do something similar against per-uid/gid/sid quotas.  I 
 suspect
 a lot of existing functions and data structures could be reused or adapted for
 most of it.  Just one more piece of metadata to update, right?  Not as if ufs
 quotas had zero runtime penalty if enabled.   And you only need counters and
 quotas in-core for identifiers applicable to in-core znodes, not for every
 identifier used on the zpool.

The current quota system does its checking of quota constraints in the 
DSL (dsl_sync_task_group_sync ends up getting the quota check made)

A user based quota system would I believe need to be in the ZPL because 
that is where we understand what users are.  I suspect this means that 
quotas would probably be easiest implemented per dataset rather than per 
pool.


-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD reliability, wear levelling, warranty period

2008-06-11 Thread Bob Friesenhahn
On Wed, 11 Jun 2008, Al Hopper wrote:
 disk drives.  But - based on personal observation - there is a lot of
 hype surrounding SSD reliability.  Obviously the *promise* of this
 technology is higher performance and *reliability* with lower power
 requirements due to no (mechanical) moving parts.  But... if you look
 broadly at the current SSD product offerings, you see: a) lower than
 expected performance - particularly in regard to write IOPS (I/O Ops
 per Second) and b) warranty periods that are typically 1 year - with
 the (currently rare) exception of products that are offered with a 5
 year warranty.

Other than the fact that SSDs eventually wear out from use, SSDs are 
no different from any other electronic device in that the number of 
individual parts, and the individual reliability of those parts, 
results in an overall reliability factor for the subsystem comprised 
of those parts.  SSDs are jam-packed with parts.  In fact, if you were 
to look inside an SSD and then look at how typical computers are 
implemented these days, you will see that one SSD has a whole lot more 
complex parts than the rest of the computer.

SSDs will naturally become more reliable as their parts count is 
reduced due to higher integration and product maturity.  Large SSD 
storage capacity requires more parts so large storage devices have 
less relability than smaller devices comprised of similar parts.

SSDs are good for laptop reliability since hard drives tend to fail 
with high shock levels and laptops are often severely abused.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SATA controller suggestion

2008-06-11 Thread Lee
I don't think so, not all of them anyway. They also sell ones that have a 
proprietary goldfinger, which obviously would not work. 

The spec does not mention any specific restrictions, just lists the interface 
types (but it is fairly breif), and you can certianly buy PCI - PCI-E generic 
adapters:

http://virtuavia.eu/shop/pci-express-to-pci-adapter-p29855.html

Which use a similar bridge chip.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD reliability, wear levelling, warranty period

2008-06-11 Thread Andy Lubel

On Jun 11, 2008, at 11:35 AM, Bob Friesenhahn wrote:

 On Wed, 11 Jun 2008, Al Hopper wrote:
 disk drives.  But - based on personal observation - there is a lot of
 hype surrounding SSD reliability.  Obviously the *promise* of this
 technology is higher performance and *reliability* with lower power
 requirements due to no (mechanical) moving parts.  But... if you look
 broadly at the current SSD product offerings, you see: a) lower than
 expected performance - particularly in regard to write IOPS (I/O Ops
 per Second) and b) warranty periods that are typically 1 year - with
 the (currently rare) exception of products that are offered with a 5
 year warranty.

 Other than the fact that SSDs eventually wear out from use, SSDs are
 no different from any other electronic device in that the number of
 individual parts, and the individual reliability of those parts,
 results in an overall reliability factor for the subsystem comprised
 of those parts.  SSDs are jam-packed with parts.  In fact, if you were
 to look inside an SSD and then look at how typical computers are
 implemented these days, you will see that one SSD has a whole lot more
 complex parts than the rest of the computer.

 SSDs will naturally become more reliable as their parts count is
 reduced due to higher integration and product maturity.  Large SSD
 storage capacity requires more parts so large storage devices have
 less relability than smaller devices comprised of similar parts.

 SSDs are good for laptop reliability since hard drives tend to fail
 with high shock levels and laptops are often severely abused.

Yeah I was going to add the fact that they dont spin at 7k+ rpm and  
have no 'moving' parts.  I do agree that there is a lot of circuitry  
involved and eventually they will reduce that just like they did with  
mainboards.  Remember how packed they used to be?

Either way, I'm really interested in the vendor and technology Sun  
will choose for providing these SSD's in systems or as an add on card/ 
drive.

-Andy



 Bob
 ==
 Bob Friesenhahn
 [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Filesystem for each home dir - 10,000 users?

2008-06-11 Thread Bob Friesenhahn
On Wed, 11 Jun 2008, Richard L. Hamilton wrote:

 But if you already have the ZAP code, you ought to be able to do
 quick lookups of arbitrary byte sequences, right?  Just assume that
 a value not stored is zero (or infinity, or uninitialized, as applicable),
 and you have the same functionality as  the sparse quota file on ufs,
 without the problems.

I don't know anything about ZAP code but I do know that CPU caches 
are only so large and there can be many caches for the same data since 
each CPU has its own cache.  Some of us do actual computing using 
these same CPUs so it would be nice if they weren't entirely consumed 
by the filesystem.

Current application performance on today's hardware absolutely sucks 
as compared to its theroretical potential.  Let's try to improve that 
performance rather than adding more cache thrashing, leading to more 
wait states.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD reliability, wear levelling, warranty period

2008-06-11 Thread Mertol Ozyoney
Hi All ;

Every NAND based SSD HDD have some ram. Consumer grade products will have 
smaller not battery protected ram with a smaller number of prallel working nand 
chips and a slower cpu to distribute the load. Also consumer product will have 
less number of spare cells. 


Enterprise SSD's are genrally compose of several nand devices and a lot of 
spare cells controlled by a fast micro computer which also have some cache and 
a super capacitor to protect the cache. 

Regardless of NAND write cycle capability, vendors can design a reliable SSD 
with incorprating more spare cells in to the desing. 

Mertol


Mertol Ozyoney 
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +90212335
Email [EMAIL PROTECTED]


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Richard L. 
Hamilton
Sent: Wednesday, June 11, 2008 2:58 PM
To: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] SSD reliability, wear levelling, warranty period

  btw: it's seems to me that this thread is a little
 bit OT.
 
 I don't think its OT - because SSDs make perfect
 sense as ZFS log
 and/or cache devices.  If I did not make that clear
 in my OP then I
 failed to communicate clearly.  In both these roles
 (log/cache)
 reliability is of the utmost importance.

Older SSDs (before cheap and relatively high-cycle-limit flash)
were RAM cache+battery+hard disk.  Surely RAM+battery+flash
is also possible; the battery only needs to keep the RAM alive long
enough to stage to the flash.  That keeps the write count on the flash
down, and the speed up (RAM being faster than flash).  Such a device
would of course cost more, and be less dense (given having to have
battery+charging circuits and RAM as well as flash), than a pure flash device.
But with more limited write rates needed, and no moving parts, _provided_
it has full ECC and maybe radiation-hardened flash (if that exists), I can't
imagine why such a device couldn't be exceedingly reliable and have quite
a long lifetime (with the battery, hopefully replaceable, being more of
a limitation than the flash).

It could be a matter of paying for how much quality you want...

As for reliability, from zpool(1m):

log

A separate intent log device. If more than one log device is specified, 
 then
  writes are load-balanced between devices. Log devices can be mirrored.
  However, raidz and raidz2 are not supported for the intent log. For more
  information, see the “Intent Log” section.

 cache

A device used to cache storage pool data. A cache device cannot be mirrored
 or part of a raidz or raidz2 configuration. For more information, see the
 “Cache Devices” section.
[...]
 Cache Devices

  Devices can be added to a storage pool as “cache devices.” These devices
 provide an additional layer of caching between main memory and disk. For
 read-heavy workloads, where the working set size is much larger than what can
 be cached in main memory, using cache devices allow much more of this
 working set to be served from low latency media. Using cache devices provides
 the greatest performance improvement for random read-workloads of mostly
  static content.

   To create a pool with cache devices, specify a “cache” vdev with any number
 of devices. For example:

  # zpool create pool c0d0 c1d0 cache c2d0 c3d0

Cache devices cannot be mirrored or part of a raidz configuration. If a 
 read
  error is encountered on a cache device, that read I/O is reissued to the
  original storage pool device, which might be part of a mirrored or raidz
  configuration.

  The content of the cache devices is considered volatile, as is the case with
 other system caches.

That tells me that the zil can be mirrored and zfs can recover from cache 
errors.

I think that means that these devices don't need to be any more reliable than
regular disks, just much faster.

So...expensive ultra-reliability SSD, or much less expensive SSD plus mirrored
zil?  Given what zfs can do with cheap SATA, my bet is on the latter...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SATA controller suggestion

2008-06-11 Thread Brandon High
On Wed, Jun 11, 2008 at 8:21 AM, Tim [EMAIL PROTECTED] wrote:
 Are those universal though?  I was under the impression it had to be
 supported by the motherboard, or you'd fry all components involved.

There are PCI/PCI-X to PCI-e bridge chips available (as well as PCI-e
to AGP) and they're part of the spec. As to how well they actually
work on a separate riser card, I'm not sure. I like the idea though.

This board looks decent if you need a ton of drives. The second x16
slot is actually a x4 electrical, but that not too shabby for a $100
mobo.
http://www.newegg.com/Product/Product.aspx?Item=N82E16813128335

-B

-- 
Brandon High [EMAIL PROTECTED]
The good is the enemy of the best. - Nietzsche
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD reliability, wear levelling, warranty period

2008-06-11 Thread Al Hopper
On Wed, Jun 11, 2008 at 10:35 AM, Bob Friesenhahn
[EMAIL PROTECTED] wrote:
 On Wed, 11 Jun 2008, Al Hopper wrote:

 disk drives.  But - based on personal observation - there is a lot of
 hype surrounding SSD reliability.  Obviously the *promise* of this
 technology is higher performance and *reliability* with lower power
 requirements due to no (mechanical) moving parts.  But... if you look
 broadly at the current SSD product offerings, you see: a) lower than
 expected performance - particularly in regard to write IOPS (I/O Ops
 per Second) and b) warranty periods that are typically 1 year - with
 the (currently rare) exception of products that are offered with a 5
 year warranty.

 Other than the fact that SSDs eventually wear out from use, SSDs are no
 different from any other electronic device in that the number of individual
 parts, and the individual reliability of those parts, results in an overall
 reliability factor for the subsystem comprised of those parts.  SSDs are
 jam-packed with parts.  In fact, if you were to look inside an SSD and then
 look at how typical computers are implemented these days, you will see that
 one SSD has a whole lot more complex parts than the rest of the computer.

Agreed - but the effect on overall system reliability is dominated by
the required number of interconnections (soldered joints etc), rather
than the total number of parts.  But we're drifting OT here...

 SSDs will naturally become more reliable as their parts count is reduced due
 to higher integration and product maturity.  Large SSD storage capacity
 requires more parts so large storage devices have less relability than
 smaller devices comprised of similar parts.

Again - agreed - but the root problem being addressed is the
reduction in the number of *interconnections* - which is directly
related to the number of parts.

 SSDs are good for laptop reliability since hard drives tend to fail with
 high shock levels and laptops are often severely abused.

My personal experience, echoed by numereous others I've talked with,
is that a typical laptop drive dies in 18 months - whether the laptop
travels or stays fixed on a desktop with occasional travel, for
example, in the office all week and brought home for the weekend.  For
most laptops, the real enemy of laptop disk drive reliability is the
operation of the drive at elevated temperatures common inside a laptop
- rather than vibration/shock.   I don't remember the number - but a
vast number of laptops spend the vast majority of their time glued
to a desk.

Regards,

-- 
Al Hopper  Logical Approach Inc,Plano,TX [EMAIL PROTECTED]
   Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD reliability, wear levelling, warranty period

2008-06-11 Thread Al Hopper
On Wed, Jun 11, 2008 at 4:31 AM, Adam Leventhal [EMAIL PROTECTED] wrote:
 On Jun 11, 2008, at 1:16 AM, Al Hopper wrote:

 But... if you look
 broadly at the current SSD product offerings, you see: a) lower than
 expected performance - particularly in regard to write IOPS (I/O Ops
 per Second)

 True. Flash is quite asymmetric in its performance characteristics.
 That said, the L2ARC has been specially designed to play well with the
 natural strengths and weaknesses of flash.

 and b) warranty periods that are typically 1 year - with
 the (currently rare) exception of products that are offered with a 5
 year warranty.

 You'll see a new class of SSDs -- eSSDs -- designed for the enterprise
 with longer warranties and more write/erase cycles. Further, ZFS will
 do its part by not killing the write/erase cycles of the L2ARC by
 constantly streaming as fast as possible. You should see lifetimes in
 the 3-5 year range on typical flash.

 Obviously, for SSD products to live up to the current marketing hype,
 they need to deliver superior performance and *reliability*.
 Everyone I know *wants* one or more SSD devices - but they also have
 the expectation that those devices will come with a warranty at least
 equivalent to current hard disk drives (3 or 5 years).

 I don't disagree entirely, but as a cache device flash actually can be
 fairly unreliable and we'll pick it up in ZFS.

 So ... I'm interested in learning from anyone on this list, and, in
 particular, from Team ZFS, what the reality is regarding SSD
 reliability.  Obviously Sun employees are not going to compromise
 their employment and divulge upcoming product specific data - but
 there must be *some* data (white papers etc) in the public domain that
 would provide some relevant technical data??


 A typical high-end SSD can sustain 100k write/erase cycles so you can
 do some simple math to see that a 128GB device written to at a rate of
 150M/s will last nearly 3 years. Again, note that unreliable devices
 will result in a performance degradation when you fail a checksum in
 the L2ARC, but the data will still be valid out of the main storage
 pool.

 You're going to see much more on this in the next few months. I made a
 post to my blog that probably won't answer your questions directly, but
 may help inform you about what we have in mind.

  http://blogs.sun.com/ahl/entry/flash_hybrid_pools_and_future

 Adam

 --
 Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl



Ahh Haa!  So this is the secret project (probably one of many) that
you guys have been working on!  :)   Great post and I really
appreciate how this thread has provided  lots of interesting stuff to
think about.

I think that I'll (personally) avoid the initial rush-to-market
comsumer level products by vendors with no track record of high tech
software development - let alone those who probably can't afford the
PhD level talent it takes to get the wear leveling algorithms
correct - and then to implement them correctly.  Instead I'll wait for
a Sun product - from a company with a track record of proven design
and *implementation* for enterprise level products (software and
hardware).

Otherwise, I think that I would be really upset with an SSD device
that died every 2+ years - even if it has a 5 year warranty.  No one I
know would tolerate that kind of system disruption from todays hard
disk drives - despite anticipated failures.   Its more aggravation
that most production oriented systems can simply live without!

Again - thanks to all contributors for this interesting thread.

Regards,

-- 
Al Hopper  Logical Approach Inc,Plano,TX [EMAIL PROTECTED]
   Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Growing root pool ?

2008-06-11 Thread Wyllys Ingersoll
Luckily, my system had a pair of identical, 232GB disks.  The 2nd wasn't yet 
used, so by juggling mirrors (create 3 mirrors, detach the one to change, 
etc...), I was able to reconfigure my disks more to my liking - all without a 
single reboot or loss of data.  I now have 2 pools - a 20GB root pool and a 
210GB other pool, each mirrored on the other disk.   If not for the extra 
disk and the wonderful zfs snapshot/send/receive feature it would have taken a 
lot more time and aggravation to get it straightened out.

-Wyllys
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD reliability, wear levelling, warranty period

2008-06-11 Thread Vincent Fox
Your key problem is going to be:

Will Sun use SLC or MLC?

From what I have read the trend now is towards MLC chips which have much lower 
number of write cycles but are cheaper and more storage.  So then they end up 
layering ECC and wear-levelling on to address this shortened life-span.   A 
lot of the large USB thumb-drives now are MLC and the appallingly slow write 
speeds and shortened lifespan are a problem.  Older 4-gig and 8-gig SLC 
versions of the same device are superior.  Hard to find this info though.

I would use any SSD in a mirror assuming there WILL be cells going out over 
time.  This would be true for me with both boot drives and slog devices.  You 
can mirror the log device also for ZFS.  I'm not really clear though if the log 
device is a huge enough win on SSD to warrant all this trouble.  Depends on 
your application.  I think if performance were my god I would chase after 
something like a RAMSAN device instead for logging.  RAM-based performance with 
disk as backing-store.

YMMV.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Growing root pool ?

2008-06-11 Thread Bill Sommerfeld

On Wed, 2008-06-11 at 07:40 -0700, Richard L. Hamilton wrote:
  I'm not even trying to stripe it across multiple
  disks, I just want to add another partition (from the
  same physical disk) to the root pool.  Perhaps that
  is a distinction without a difference, but my goal is
  to grow my root pool, not stripe it across disks or
  enable raid features (for now).
  
  Currently, my root pool is using c1t0d0s4 and I want
  to add c1t0d0s0 to the pool, but can't.
  
  -Wyllys
 
 Right, that's how it is right now (which the other guy seemed to
 be suggesting might change eventually, but nobody knows when
 because it's just not that important compared to other things).
 
 AFAIK, if you could shrink the partition whose data is after
 c1t0d0s4 on the disk, you could grow c1t0d0s4 by that much,
 and I _think_ zfs would pick up the growth of the device automatically.

This works.  ZFS doesn't notice the size increase until you reboot.

I've been installing systems over the past year with a slice arrangement
intended to make it easy to go to zfs root:

s0 with a ZFS pool at start of  disk
s1 swap
s3 UFS boot environment #1
s4 UFS boot environment #2
s7 SVM metadb (if mirrored root)

I was happy to discover that this paid off.  Once I upgraded a BE to
nv_90 and was running on it, it was a matter of:

lucreate -p $pool -n nv_90zfs
luactivate nv_90zfs

init 6  (reboot)

ludelete other BE's

format
format partition
delete slices other than s0
grow s0 to full disk

reboot

and you're all ZFS all the time.

- Bill

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Filesystem for each home dir - 10,000 users?

2008-06-11 Thread Vincent Fox
This is one of those issues, where the developers generally seem to think that 
old-style quotas is legacy baggage.  And that people running large 
home-directory sort of servers with 10,000+ users are a minority that can 
safely be ignored.

I can understand their thinking.However it does represent a problem here at 
the University of California Davis.  I would love to replace our Solaris 9 
home-directory server with one running Solaris 10 and ZFS.  A past issue with 
UFS corruption keeps us all nervous.  But there is no other alternative to 
UFS+quotas as yet it seems.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Growing root pool ?

2008-06-11 Thread Wyllys Ingersoll
I had a similar configuration until my recent re-install to snv91.  Now I am 
have just 2 ZFS pools - one for root+boot (big enough to hold multiple BEs and 
do LiveUpgrades) and another for the rest of my data.

-Wyllys
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive issue

2008-06-11 Thread Nils Goroll
see: http://bugs.opensolaris.org/view_bug.do?bug_id=6700597
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD reliability, wear levelling, warranty period

2008-06-11 Thread Adam Leventhal
On Wed, Jun 11, 2008 at 01:51:17PM -0500, Al Hopper wrote:
 I think that I'll (personally) avoid the initial rush-to-market
 comsumer level products by vendors with no track record of high tech
 software development - let alone those who probably can't afford the
 PhD level talent it takes to get the wear leveling algorithms
 correct - and then to implement them correctly.  Instead I'll wait for
 a Sun product - from a company with a track record of proven design
 and *implementation* for enterprise level products (software and
 hardware).

Wear leveling is actually a fairly mature technology. I'm more concerned
with what will happen as people continue pushing these devices out of the
consumer space and into the enterprise where stuff like failure modes and
reliability matters in a completely different way. If my iPod sucks that's
a hassle, but it's a different matter if an SSD hangs an I/O request on my
enterprise system.

Adam

-- 
Adam Leventhal, Fishworks http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS conflict with MAID?

2008-06-11 Thread Torrey McMahon
A Darren Dunham wrote:
 On Tue, Jun 10, 2008 at 05:32:21PM -0400, Torrey McMahon wrote:
   
 However, some apps will probably be very unhappy if i/o takes 60 seconds 
 to complete.
 

 It's certainly not uncommon for that to occur in an NFS environment.
 All of our applications seem to hang on just fine for minor planned and
 unplanned outages.

 Would the apps behave differently in this case?  (I'm certainly not
 thinking of a production database for such a configuration).

Some applications have their own internal timers that track i/o time 
and, if it doesn't complete in time, will error out. I don't know which 
part of the stack the timer was in but I've seen an Oracle RAC cluster 
on QFS timeout much faster then the SCSI retries normally allow for. (I 
think it was Oracle in that case...)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs receive - list contents of incremental stream?

2008-06-11 Thread Robert Lawhead
Thanks, Matt.  Are you interested in feedback on various questions regarding 
how to display results?  On list or off?  Thanks.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS root boot failure?

2008-06-11 Thread Vincent Fox
So I decided to test out failure modes of ZFS root mirrors.

Installed on a V240 with nv90.  Worked great.

Pulled out disk1, then replaced it and attached again, resilvered, all good.

Now I pull out disk0 to simulate failure there.  OS up and running fine, but 
lots of error message about SYNC CACHE.

Next I decided to init 0, and reinsert disk 0, and reboot.  Uh oh!

Probing system devices
Probing memory
Probing I/O buses

Sun Fire V240, No Keyboard
Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.22.33, 8192 MB memory installed, Serial #54881337.
Ethernet address 0:3:ba:45:6c:39, Host ID: 83456c39.



Rebooting with command: boot
Boot device: /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],0:a  
File and args:
SunOS Release 5.11 Version snv_90 64-bit
Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
NOTICE: 

  ***  
  *  This device is not bootable!   *  
  *  It is either offlined or detached or faulted.  *  
  *  Please try to boot from a different device.*  
  ***  


NOTICE: 
spa_import_rootpool: error 22

Cannot mount root on /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0:a fstype zfs

panic[cpu1]/thread=180e000: vfs_mountroot: cannot mount root

0180b950 genunix:vfs_mountroot+348 (600, 200, 800, 200, 1874800, 
12b6000)
  %l0-3: 0001d524 0064 0001d4c0 1d4c
  %l4-7: 05dc 1770 0640 018c7000
0180ba10 genunix:main+b4 (1815000, 180c000, 1837240, 18151f8, 1, 
180e000)
  %l0-3: 01838258 70002000 010bfc00 
  %l4-7: 0183c400 0001 0180c000 01837c00

skipping system dump - no dump device configured
rebooting...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS root boot failure?

2008-06-11 Thread Tim
Sounds correct to me.  The disk isn't sync'd so boot should fail.  If
you pull disk0 or set disk1 as the primary boot device what does it
do?  You can't expect it to resliver before booting.





On 6/11/08, Vincent Fox [EMAIL PROTECTED] wrote:
 So I decided to test out failure modes of ZFS root mirrors.

 Installed on a V240 with nv90.  Worked great.

 Pulled out disk1, then replaced it and attached again, resilvered, all good.

 Now I pull out disk0 to simulate failure there.  OS up and running fine, but
 lots of error message about SYNC CACHE.

 Next I decided to init 0, and reinsert disk 0, and reboot.  Uh oh!

 Probing system devices
 Probing memory
 Probing I/O buses

 Sun Fire V240, No Keyboard
 Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
 OpenBoot 4.22.33, 8192 MB memory installed, Serial #54881337.
 Ethernet address 0:3:ba:45:6c:39, Host ID: 83456c39.



 Rebooting with command: boot
 Boot device: /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL 
 PROTECTED],0:a  File and args:
 SunOS Release 5.11 Version snv_90 64-bit
 Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
 Use is subject to license terms.
 NOTICE:

   ***
   *  This device is not bootable!   *
   *  It is either offlined or detached or faulted.  *
   *  Please try to boot from a different device.*
   ***


 NOTICE:
 spa_import_rootpool: error 22

 Cannot mount root on /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL 
 PROTECTED],0:a fstype zfs

 panic[cpu1]/thread=180e000: vfs_mountroot: cannot mount root

 0180b950 genunix:vfs_mountroot+348 (600, 200, 800, 200, 1874800,
 12b6000)
   %l0-3: 0001d524 0064 0001d4c0 1d4c
   %l4-7: 05dc 1770 0640 018c7000
 0180ba10 genunix:main+b4 (1815000, 180c000, 1837240, 18151f8, 1,
 180e000)
   %l0-3: 01838258 70002000 010bfc00 
   %l4-7: 0183c400 0001 0180c000 01837c00

 skipping system dump - no dump device configured
 rebooting...


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Quota question

2008-06-11 Thread Boyd Adamson
Glaser, David [EMAIL PROTECTED] writes:

 Hi all, I?m new to the list and I thought I?d start out on the right
 foot. ZFS is great, but I have a couple questions?.

 I have a Try-n-buy x4500 with one large zfs pool with 40 1TB drives in
 it. The pool is named backup.

 Of this pool, I have a number of volumes.

 backup/clients

 backup/clients/bob

 backup/clients/daniel

 ?

 Now bob and Daniel are populated by rsync over ssh to synchronize
 filesystems with client machines. (the data will then be written to a
 SL500) I?d like to set the quota on /backup/clients to some arbitrary
 small amount. Seems pretty handy since nothing should go into
 backup/clients but into the volumes backup/ clients/* But when I set
 the quota on backup/clients, I am unable to increase the quota for the
 sub volumes (bob, Daniel, etc).

 Any ideas if this is possible or how to do it?

Sounds like you want refquota:

From: zfs(1M)

 refquota=size | none

 Limits the amount of space a dataset can  consume.  This
 property  enforces  a  hard limit on the amount of space
 used. This hard limit does not  include  space  used  by
 descendents, including file systems and snapshots.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS root boot failure?

2008-06-11 Thread Vincent Fox
Ummm, could you back up a bit there?

What do you mean disk isn't sync'd so boot should fail?  I'm coming from UFS 
of course where I'd expect to be able to fix a damaged boot drive as it drops 
into a single-user root prompt.

I believe I did try boot disk1 but that failed I think due to prior trial with 
it, where I scrambled it with dd, then resilvered.  Then removed it, replaced, 
resilvered it.  Think I ended up with unusable boot sector on disk1 that didn't 
work but I didn't copy the message down sorry.

I suppose all that would have been left is boot from media or jumpstart server 
in single-user and attempt repairs.  Unfortunately I have since re-jumpstarted 
the system clean.  This was plain nv90 both times by the way no /etc/system 
tweaks.

I have to pull the motherboard on the V240 and replace it tomorrow, maybe on 
Friday I will be able to repeat my experiment.  Just wanted to run through some 
failure-modes so I know what to expect when boot drives die on me.

Thanks!
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS root boot failure?

2008-06-11 Thread Richard Elling
Vincent Fox wrote:
 So I decided to test out failure modes of ZFS root mirrors.

 Installed on a V240 with nv90.  Worked great.

 Pulled out disk1, then replaced it and attached again, resilvered, all good.

 Now I pull out disk0 to simulate failure there.  OS up and running fine, but 
 lots of error message about SYNC CACHE.

 Next I decided to init 0, and reinsert disk 0, and reboot.  Uh oh!
   

This is actually very good.  It means that ZFS recognizes that there
are two, out of sync mirrors and you booted from the oldest version.
What happens when you change the boot order?
 -- richard

 Probing system devices
 Probing memory
 Probing I/O buses

 Sun Fire V240, No Keyboard
 Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
 OpenBoot 4.22.33, 8192 MB memory installed, Serial #54881337.
 Ethernet address 0:3:ba:45:6c:39, Host ID: 83456c39.



 Rebooting with command: boot
 Boot device: /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL 
 PROTECTED],0:a  File and args:
 SunOS Release 5.11 Version snv_90 64-bit
 Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
 Use is subject to license terms.
 NOTICE: 

   ***  
   *  This device is not bootable!   *  
   *  It is either offlined or detached or faulted.  *  
   *  Please try to boot from a different device.*  
   ***  


 NOTICE: 
 spa_import_rootpool: error 22

 Cannot mount root on /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL 
 PROTECTED],0:a fstype zfs

 panic[cpu1]/thread=180e000: vfs_mountroot: cannot mount root

 0180b950 genunix:vfs_mountroot+348 (600, 200, 800, 200, 1874800, 
 12b6000)
   %l0-3: 0001d524 0064 0001d4c0 1d4c
   %l4-7: 05dc 1770 0640 018c7000
 0180ba10 genunix:main+b4 (1815000, 180c000, 1837240, 18151f8, 1, 
 180e000)
   %l0-3: 01838258 70002000 010bfc00 
   %l4-7: 0183c400 0001 0180c000 01837c00

 skipping system dump - no dump device configured
 rebooting...
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS root boot failure?

2008-06-11 Thread Richard Elling
Vincent Fox wrote:
 Ummm, could you back up a bit there?

 What do you mean disk isn't sync'd so boot should fail?  I'm coming from 
 UFS of course where I'd expect to be able to fix a damaged boot drive as it 
 drops into a single-user root prompt.

 I believe I did try boot disk1 but that failed I think due to prior trial 
 with it, where I scrambled it with dd, then resilvered.  Then removed it, 
 replaced, resilvered it.  Think I ended up with unusable boot sector on disk1 
 that didn't work but I didn't copy the message down sorry.

 I suppose all that would have been left is boot from media or jumpstart 
 server in single-user and attempt repairs.  Unfortunately I have since 
 re-jumpstarted the system clean.  This was plain nv90 both times by the way 
 no /etc/system tweaks.

 I have to pull the motherboard on the V240 and replace it tomorrow, maybe on 
 Friday I will be able to repeat my experiment.  Just wanted to run through 
 some failure-modes so I know what to expect when boot drives die on me.
   

Sequence of events failures are one of the most common fatal
errors in complex systems.  In this case, you induced a failure
mode we call amnesia.  It works like this:

Consider a system with two (!) mirrored disks (AB) working normally
and in sync.

At time0, disconnect disk A.  It will still contain a view of the system
state, but is not accessible by the system.

At time1, the system gives up on disk A and proceeds using disk B.
Now the two disks are no longer in sync and the data on disk B is
newer than the data on disk A.

At time2, shutdown the system. Re-attach disk A.

The correct behaviour is that disk A is old and its data should be
ignored until repaired.  Disk B should be the primary, authoritative
view of the system state.  This failure mode is called amnesia
because disk A doesn't remember the changes that should have
occurred if it had been an active, functional member of the
system.

AFAIK, SVM will not handle this problem well.  ZFS and Solaris
Cluster can detect this because the configuration metadata knows
the time difference (ZFS can detect this by the latest txg).

I predict that if you had booted from disk B, then it would have
worked (but I don't have the hardware setup to test this tonight)

NB, for those who don't know about SPARC boot sequences,
the OpenBoot program has a default boot device list and will
try the first device, then the second, and so on.  This is similar
to how most BIOSes work.  While you wouldn't normally
expect to need to worry about this, it makes a difference in the
case of amnesia.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS conflict with MAID?

2008-06-11 Thread Richard Elling
Torrey McMahon wrote:
 A Darren Dunham wrote:
   
 On Tue, Jun 10, 2008 at 05:32:21PM -0400, Torrey McMahon wrote:
   
 
 However, some apps will probably be very unhappy if i/o takes 60 seconds 
 to complete.
 
   
 It's certainly not uncommon for that to occur in an NFS environment.
 All of our applications seem to hang on just fine for minor planned and
 unplanned outages.

 Would the apps behave differently in this case?  (I'm certainly not
 thinking of a production database for such a configuration).
 

 Some applications have their own internal timers that track i/o time 
 and, if it doesn't complete in time, will error out. I don't know which 
 part of the stack the timer was in but I've seen an Oracle RAC cluster 
 on QFS timeout much faster then the SCSI retries normally allow for. (I 
 think it was Oracle in that case...)

Oracle bails out after 10 minutes (ORA-27062) ask me how I know... :-P
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss