from:"Stuart Anderson"

Re: [zfs-discuss] Problem booting after zfs upgrade

2011-08-06 Thread Stuart Anderson

On Aug 5, 2011, at 8:55 PM, Edward Ned Harvey wrote:

 
 In any event...  You need to do something like this:
 installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t0d0s0
   (substitute whatever device  slice you have used for rpool)

That did the trick, thanks.

Out of curiosity, does anyone know at what version you get a warning,
and at what version installgrub is run automatically after upgrading
a root pool/filesystem?


--
Stuart Anderson  ander...@ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Problem booting after zfs upgrade

2011-08-05 Thread stuart anderson

After upgrading to zpool version 29/zfs version 5 on a S10 test system via the 
kernel patch 144501-19 it will now boot only as far as the to the grub menu.

What is a good Solaris rescue image that I can boot that will allow me to 
import this rpool to look at it (given the newer version)?

Thanks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Validating a zfs send object

2011-01-31 Thread stuart anderson

How do you verify that a zfs send binary object is valid?

I tried running a truncated file through zstreamdump and it completed
with no error messages and an exit() status of 0. However, I noticed it
was missing a final print statement with a checksum value,
END checksum = ...

Is there any normal circumstance under which this END checksum statement
will be missing?

More usefully is there an option to zstreamdump, or a similar program, to 
validate
validate an internal checksum value stored in a zfs send binary object?

Or is the only way to do this with zfs receive?

Thanks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Partitioning ARC

2011-01-31 Thread Stuart Anderson


On Jan 30, 2011, at 6:03 PM, Richard Elling wrote:

 On Jan 30, 2011, at 5:01 PM, Stuart Anderson wrote:
 On Jan 30, 2011, at 2:29 PM, Richard Elling wrote:
 
 On Jan 30, 2011, at 12:21 PM, stuart anderson wrote:
 
 Is it possible to partition the global setting for the maximum ARC size
 with finer grained controls? Ideally, I would like to do this on a per
 zvol basis but a setting per zpool would be interesting as well?
 
 While perhaps not perfect, see the primarycache and secondarycache
 properties of the zvol or file system.
 
 With primarycache I can turn off utilization of the ARC for some zvol's,
 but instead I was hoping to use the ARC but limit the maximum amount
 on a per zvol basis.
 
 Just a practical question, do you think the average storage admin will have
 any clue as to how to use this tunable?

Yes. I think the basic idea of partitioning a memory cache over different
storage objects is a straightforward concept.

 How could we be more effective in
 communicating the features and pitfalls of resource management at this 
 level?

Document that this is normally handled dynamically based on the default
policy that all storage objects should be assigned ARC space on a fair
share basis. However, if different quality of service is required for different
storage objects this may be adjusted as follows...

 
 
 The use case is to prioritize which zvol devices should be fully cached
 in DRAM on a server that cannot fit them all in memory.
 
 It is not clear to me that this will make sense in a world of snapshots and 
 dedup.
 Could you explain your requirements in more detail?
 
 I am using zvol's to hold the metadata for another filesystem (SAM-QFS).
 In some circumstances I can fit enough of this into the ARC that virtually
 all metadata reads IOPS happen at DRAM performance rather than SSD
 or slower.
 
 However, with a single server hosting multiple filesystem (hence multiple
 zvols) I would like to be able to prioritize the use of the ARC.
 
 I think there is merit to this idea. It can be especially useful in the zone
 context. Please gather your thoughts and file an RFE at www.illumos.org

Not sure how to file an illumos RFE, but one simple model to think about
would is a 2 tiered system where by default ZFS datasets use the ARC is
currently the case, with no (to the best of my knowledge) relative priority,
but some objects could optionally specific a request for a minimum size,
e.g., add a companion attribute to primarycache named primarycachesize.
This would represent the minimum amount of ARC space that is available
for that object.

Some thought would have to be given as to how to indicate if the sum
of all primarycachesize settings is greater than zfs_arc_max, and
document what happens in this case, e.g., all values ignored?

Presumably something similar could also be considered for secondarycache.


Thanks.


--
Stuart Anderson  ander...@ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Query zfs send objects

2011-01-30 Thread Stuart Anderson


On Jan 29, 2011, at 10:00 PM, Richard Elling wrote:

 On Jan 29, 2011, at 5:48 PM, stuart anderson wrote:
 
 Is there a simple way to query zfs send binary objects for basic information 
 such as:
 
 1) What snapshot they represent?
 2) When they where created?
 3) Whether they are the result of an incremental send?
 4) What the the baseline snapshot was, if applicable?
 5) What ZFS version number they where made from?
 6) Anything else that would be useful to keep them around as backup binary 
 blobs
   on an archival system, e.g., SAM-QFS?
 
 zstreamdump has a -v option which will show header information. The structure 
 of
 that documentation is only shown in the source, though.

Thanks for the pointer. This has most of the information I am looking for. Do 
you know
how to get zstreamdump to display whether it is parsing an incremental dump, 
and if so,
what snapshot it is relative to?

Put another way, given 2 zfs send binary blobs, can I determine if they are 
related
without trying to restore them to a ZFS filesystem?

Thanks.

--
Stuart Anderson  ander...@ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Partitioning ARC

2011-01-30 Thread stuart anderson

Is it possible to partition the global setting for the maximum ARC size
with finer grained controls? Ideally, I would like to do this on a per
zvol basis but a setting per zpool would be interesting as well?

The use case is to prioritize which zvol devices should be fully cached
in DRAM on a server that cannot fit them all in memory.

Thanks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Query zfs send objects

2011-01-30 Thread Stuart Anderson


On Jan 30, 2011, at 1:49 PM, Richard Elling wrote:

 On Jan 30, 2011, at 11:19 AM, Stuart Anderson wrote:
 
 On Jan 29, 2011, at 10:00 PM, Richard Elling wrote:
 
 On Jan 29, 2011, at 5:48 PM, stuart anderson wrote:
 
 Is there a simple way to query zfs send binary objects for basic 
 information such as:
 
 1) What snapshot they represent?
 2) When they where created?
 3) Whether they are the result of an incremental send?
 4) What the the baseline snapshot was, if applicable?
 5) What ZFS version number they where made from?
 6) Anything else that would be useful to keep them around as backup binary 
 blobs
 on an archival system, e.g., SAM-QFS?
 
 zstreamdump has a -v option which will show header information. The 
 structure of
 that documentation is only shown in the source, though.
 
 Thanks for the pointer. This has most of the information I am looking for. 
 Do you know
 how to get zstreamdump to display whether it is parsing an incremental dump, 
 and if so,
 what snapshot it is relative to?
 
 Put another way, given 2 zfs send binary blobs, can I determine if they are 
 related
 without trying to restore them to a ZFS filesystem?
 
 Each incremental send stream has a from and a to Global Unique Identifier 
 (GUID) for
 the snapshots. As send stream with many incremental snapshots will have many 
 of these
 pairs.

Got it. That works.

Thanks.

--
Stuart Anderson  ander...@ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Partitioning ARC

2011-01-30 Thread Stuart Anderson


On Jan 30, 2011, at 2:29 PM, Richard Elling wrote:

 On Jan 30, 2011, at 12:21 PM, stuart anderson wrote:
 
 Is it possible to partition the global setting for the maximum ARC size
 with finer grained controls? Ideally, I would like to do this on a per
 zvol basis but a setting per zpool would be interesting as well?
 
 While perhaps not perfect, see the primarycache and secondarycache
 properties of the zvol or file system.

With primarycache I can turn off utilization of the ARC for some zvol's,
but instead I was hoping to use the ARC but limit the maximum amount
on a per zvol basis.

 
 The use case is to prioritize which zvol devices should be fully cached
 in DRAM on a server that cannot fit them all in memory.
 
 It is not clear to me that this will make sense in a world of snapshots and 
 dedup.
 Could you explain your requirements in more detail?

I am using zvol's to hold the metadata for another filesystem (SAM-QFS).
In some circumstances I can fit enough of this into the ARC that virtually
all metadata reads IOPS happen at DRAM performance rather than SSD
or slower.

However, with a single server hosting multiple filesystem (hence multiple
zvols) I would like to be able to prioritize the use of the ARC.


P.S. Since SAM-QFS metadata is highly compressible, O(10x), it would
also be great if there was an option to cache the compressed blocks in
DRAM (and not just the decompressed version).


Thanks.

--
Stuart Anderson  ander...@ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Query zfs send objects

2011-01-29 Thread stuart anderson

Is there a simple way to query zfs send binary objects for basic information 
such as:

1) What snapshot they represent?
2) When they where created?
3) Whether they are the result of an incremental send?
4) What the the baseline snapshot was, if applicable?
5) What ZFS version number they where made from?
6) Anything else that would be useful to keep them around as backup binary blobs
on an archival system, e.g., SAM-QFS?

Thanks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Stuart Anderson


On Apr 2, 2010, at 5:08 AM, Edward Ned Harvey wrote:

 I know it is way after the fact, but I find it best to coerce each
 drive down to the whole GB boundary using format (create Solaris
 partition just up to the boundary). Then if you ever get a drive a
 little smaller it still should fit.
 
 It seems like it should be unnecessary.  It seems like extra work.  But
 based on my present experience, I reached the same conclusion.
 
 If my new replacement SSD with identical part number and firmware is 0.001
 Gb smaller than the original and hence unable to mirror, what's to prevent
 the same thing from happening to one of my 1TB spindle disk mirrors?
 Nothing.  That's what.
 
 I take it back.  Me.  I am to prevent it from happening.  And the technique
 to do so is precisely as you've said.  First slice every drive to be a
 little smaller than actual.  Then later if I get a replacement device for
 the mirror, that's slightly smaller than the others, I have no reason to
 care.

However, I believe there are some downsides to letting ZFS manage just
a slice rather than an entire drive, but perhaps those do not apply as
significantly to SSD devices?

Thanks

--
Stuart Anderson  ander...@ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-31 Thread Stuart Anderson

Edward Ned Harvey solaris2 at nedharvey.com writes:

 
 Allow me to clarify a little further, why I care about this so much.  I have
 a solaris file server, with all the company jewels on it.  I had a pair of
 intel X.25 SSD mirrored log devices.  One of them failed.  The replacement
 device came with a newer version of firmware on it.  Now, instead of
 appearing as 29.802 Gb, it appears at 29.801 Gb.  I cannot zpool attach.
 New device is too small.
 
 So apparently I'm the first guy this happened to.  Oracle is caught totally
 off guard.  They're pulling their inventory of X25's from dispatch
 warehouses, and inventorying all the firmware versions, and trying to figure
 it all out.  Meanwhile, I'm still degraded.  Or at least, I think I am.
 
 Nobody knows any way for me to remove my unmirrored log device.  Nobody
 knows any way for me to add a mirror to it (until they can locate a drive
 with the correct firmware.)  All the support people I have on the phone are
 just as scared as I am.  Well we could upgrade the firmware of your
 existing drive, but that'll reduce it by 0.001 Gb, and that might just
 create a time bomb to destroy your pool at a later date.  So we don't do
 it.
 
 Nobody has suggested that I simply shutdown and remove my unmirrored SSD,
 and power back on.
 

We ran into something similar with these drives in an X4170 that turned out to
be  an issue of the preconfigured logical volumes on the drives. Once we made
sure all of our Sun PCI HBAs where running the exact same version of firmware
and recreated the volumes on new drives arriving from Sun we got back into sync
on the X25-E devices sizes.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-31 Thread Stuart Anderson


On Mar 31, 2010, at 8:58 PM, Edward Ned Harvey wrote:

 We ran into something similar with these drives in an X4170 that turned
 out to
 be  an issue of the preconfigured logical volumes on the drives. Once
 we made
 sure all of our Sun PCI HBAs where running the exact same version of
 firmware
 and recreated the volumes on new drives arriving from Sun we got back
 into sync
 on the X25-E devices sizes.
 
 Can you elaborate?  Just today, we got the replacement drive that has
 precisely the right version of firmware and everything.  Still, when we
 plugged in that drive, and create simple volume in the storagetek raid
 utility, the new drive is 0.001 Gb smaller than the old drive.  I'm still
 hosed.
 
 Are you saying I might benefit by sticking the SSD into some laptop, and
 zero'ing the disk?  And then attach to the sun server?
 
 Are you saying I might benefit by finding some other way to make the drive
 available, instead of using the storagetek raid utility?

Assuming you are also using a PCI LSI HBA from Sun that is managed with
a utility called /opt/StorMan/arcconf and reports itself as the amazingly
informative model number Sun STK RAID INT what worked for me was to run,
arcconf delete (to delete the pre-configured volume shipped on the drive)
arcconf create (to create a new volume)

What I observed was that
arcconf getconfig 1
would show the same physical device size for our existing drives and new
ones from Sun, but they reported a slightly different logical volume size.
I am fairly sure that was due to the Sun factory creating the initial volume
with a different version of the HBA controller firmware then we where using
to create our own volumes.

If I remember the sign correctly, the newer firmware creates larger logical
volumes, and you really want to upgrade the firmware if you are going to
be running multiple X25-E drives from the same controller.

I hope that helps.


--
Stuart Anderson  ander...@ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS caching of compressed data

2010-03-27 Thread Stuart Anderson


On Oct 2, 2009, at 11:54 AM, Robert Milkowski wrote:

 Stuart Anderson wrote:
 
 On Oct 2, 2009, at 5:05 AM, Robert Milkowski wrote:
 
 Stuart Anderson wrote:
 I am wondering if the following idea makes any sense as a way to get ZFS 
 to cache compressed data in DRAM?
 
 In particular, given a 2-way zvol mirror of highly compressible data on 
 persistent storage devices, what would go wrong if I dynamically added a 
 ramdisk as a 3rd mirror device at boot time?
 
 Would ZFS route most (or all) of the reads to the lower latency DRAM 
 device?
 
 In the case of an un-clean shutdown where there was no opportunity to 
 actively remove the ramdisk from the pool before shutdown would there be 
 any problem at boot time when the ramdisk is still registered but 
 unavailable?
 
 Note, this Gedanken experiment is for highly compressible (~9x) metadata 
 for a non-ZFS filesystem.
 
 You would only get about 33% of IO's served from ram-disk.
 
 With SVM you are allowed to specify a read policy on sub-mirrors for just 
 this reason, e.g.,
 http://wikis.sun.com/display/BigAdmin/Using+a+SVM+submirror+on+a+ramdisk+to+increase+read+performance
  
 
 Is there no equivalent in ZFS?
 
 
 Nope, at least not right now.

Curious if anyone knows of any other ideas/plans for ZFS caching compressed 
data internally? or externally via a ramdisk mirror device that handles 
most/all read requests?

Thanks.

--
Stuart Anderson  ander...@ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] force 4k writes

2009-12-18 Thread Stuart Anderson


On Dec 17, 2009, at 9:21 PM, Richard Elling wrote:

 On Dec 17, 2009, at 9:04 PM, stuart anderson wrote:
 
 As a specific example of 2 devices with dramatically different performance 
 for sub-4k transfers has anyone done any ZFS benchmarks between the X25E and 
 the F20 they can share?
 
 I am particularly interested in zvol performance with a blocksize of 16k and 
 highly compressible data (~10x).
 
 16 KB recordsize?  That seems a little unusual, what is the application?

SAM-QFS metadata whose fundamental disk allocation unit (DAU) size for metadata 
is 16kB.

 
 I am going to run some comparison tests but would appreciate any initial 
 input on what to look out for or how to tune ZFS to get the most out of the 
 F20.
 
 AFAICT, no tuning should be required.  It is quite fast.
 
 It might be helpful, e.g., if there where some where in the software stack 
 where I could tell part of the system to lie and treat the F20 as a 4k 
 device?
 
 The F20 is rated at 84,000 random 4KB write IOPS.  The DRAM write
 buffer will hide 4KB write effects.

Not from some direct vdbench comparison results I have seen. My main concern 
here has to do with ZFS compression, which I need for my application, breaking 
up the transfer sizes the F20 sees into smaller than 4KB writes where there is 
a critical performance difference. I also suspect/hope that SAM-QFS is telling 
ZFS to aggressively flush/commit any metadata updates to stable storage which 
probably aggravates the problem though I have not test this yet.

 
 OTOH, the X-25E is rated at 3,300 random 4KB writes.  It shouldn't take
 much armchair analysis to come to the conclusion that the F20 is likely
 to win that IOPS battle :-)

Though to be fair you should probably compare a single F20 DOM to an X25-E, or 
4 X25E's to a full F20, and of course my systems don't run from an armchair :)

Thanks.

--
Stuart Anderson  ander...@ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] force 4k writes

2009-12-17 Thread stuart anderson

 On Wed, Dec 16 at  7:35, Bill Sprouse wrote:
 The question behind the question is, given the
 really bad things that 
 can happen performance-wise with writes that are not
 4k aligned when 
 using flash devices, is there any way to insure that
 any and all 
 writes from ZFS are 4k aligned?
 
 Some flash devices can handle this better than
 others, often several
 orders of magnitude better.  Not all devices (as you
 imply) are
 so-affected.


As a specific example of 2 devices with dramatically different performance for 
sub-4k transfers has anyone done any ZFS benchmarks between the X25E and the 
F20 they can share?

I am particularly interested in zvol performance with a blocksize of 16k and 
highly compressible data (~10x). I am going to run some comparison tests but 
would appreciate any initial input on what to look out for or how to tune ZFS 
to get the most out of the F20.

It might be helpful, e.g., if there where some where in the software stack 
where I could tell part of the system to lie and treat the F20 as a 4k device?

Thanks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zvol used apparently greater than volsize for sparse volume

2009-10-20 Thread Stuart Anderson


Cindy,
	Thanks for the pointer. Until this is resolved, is there some  
documentation
available that will let me calculate this by hand? I would like to  
know how large

the current 3-4% meta data storage I am observing can potentially grow.

Thanks.


On Oct 20, 2009, at 8:57 AM, Cindy Swearingen wrote:


Hi Stuart,

The reason why used is larger than the volsize is because we
aren't accounting for metadata, which is covered by this CR:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6429996
6429996 zvols don't reserve enough space for requisite meta data

Metadata is usually only a small percentage.

Sparse-ness is not a factor here.  Sparse just means we ignore the
reservation so you can create a zvol bigger than what we'd normally
allow.

Cindy

On 10/17/09 13:47, Stuart Anderson wrote:

What does it mean for the reported value of a zvol volsize to be
less than the product of used and compressratio?



--
Stuart Anderson  ander...@ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zvol used apparently greater than volsize for sparse volume

2009-10-17 Thread Stuart Anderson


What does it mean for the reported value of a zvol volsize to be
less than the product of used and compressratio?


For example,

# zfs get -p all home1/home1mm01
NAME PROPERTY VALUE  SOURCE
home1/home1mm01  type volume -
home1/home1mm01  creation 1254440045 -
home1/home1mm01  used 14902492672-
home1/home1mm01  available16240062464-
home1/home1mm01  referenced   14902492672-
home1/home1mm01  compressratio11.20x -
home1/home1mm01  reservation  0  default
home1/home1mm01  volsize  161061273600   -
home1/home1mm01  volblocksize 16384  -
home1/home1mm01  checksum on default
home1/home1mm01  compression  gzip-1 inherited  
from home1

home1/home1mm01  readonly offdefault
home1/home1mm01  shareiscsi   offdefault
home1/home1mm01  copies   1  default
home1/home1mm01  refreservation   0  default



Yet used (14902492672) * compresratio  (11.20) = 166907917926
which is 3.6% larger than volsize.

Is this a bug or a feature for sparse volumes? If a feature, how
much larger than volsize/compressratio can the actual used
storage space grow? e.g., fixed amount overhead and/or
fixed percentage?

Thanks.

--
Stuart Anderson  ander...@ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS caching of compressed data

2009-10-02 Thread Stuart Anderson

On Oct 2, 2009, at 5:05 AM, Robert Milkowski wrote:

Stuart Anderson wrote:
I am wondering if the following idea makes any sense as a way to
get ZFS to cache compressed data in DRAM?

In particular, given a 2-way zvol mirror of highly compressible
data on persistent storage devices, what would go wrong if I
dynamically added a ramdisk as a 3rd mirror device at boot time?

Would ZFS route most (or all) of the reads to the lower latency
DRAM device?

In the case of an un-clean shutdown where there was no opportunity
to actively remove the ramdisk from the pool before shutdown would
there be any problem at boot time when the ramdisk is still
registered but unavailable?

Note, this Gedanken experiment is for highly compressible (~9x)
metadata for a non-ZFS filesystem.

You would only get about 33% of IO's served from ram-disk.

With SVM you are allowed to specify a read policy on sub-mirrors for
just this reason, e.g.,

http://wikis.sun.com/display/BigAdmin/Using+a+SVM+submirror+on+a+ramdisk+to+increase+read+performance

Is there no equivalent in ZFS?

However at the KCA conference Bill and Jeff mentioned Just-in-time
decompression/decryption planned for ZFS. If I understand it
correctly some % of pages in ARC will be kept compressed/encrypted
and will be decompressed/decrypted only if accessed. This could be
especially useful to do so with prefetch.

I thought the optimization being discussed there was simply to avoid
decompressing/decrypting unused data. I missed the part about keeping
compressed data around in the ARC .

Now I would imaging that one will be able to tune what's percentage
of ARC should keep compressed pages.

That would be nice.

Now I don't remember if they mentioned L2ARC here but it would
probably be useful to have a tunable which would put compressed or
uncompressed data onto L2ARC depending on it's value. Which approach
is better would always depends on a given environment and on where
an actual bottleneck is.

I agree something like this would be preferable to the SVM ramdisk
solution.

Thanks.

--
Stuart Anderson ander...@ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS caching of compressed data

2009-10-01 Thread Stuart Anderson

I am wondering if the following idea makes any sense as a way to get  
ZFS to cache compressed data in DRAM?


In particular, given a 2-way zvol mirror of highly compressible data  
on persistent storage devices, what would go wrong if I dynamically  
added a ramdisk as a 3rd mirror device at boot time?


Would ZFS route most (or all) of the reads to the lower latency DRAM  
device?


In the case of an un-clean shutdown where there was no opportunity to  
actively remove the ramdisk from the pool before shutdown would there  
be any problem at boot time when the ramdisk is still registered but  
unavailable?


Note, this Gedanken experiment is for highly compressible (~9x)  
metadata for a non-ZFS filesystem.


Thanks.


--
Stuart Anderson  ander...@ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Transient permanent errors

2009-09-13 Thread Stuart Anderson

I have seen this again on a different server. Presumably not a big  
deal, but a false alarm about data corruption is probably not good  
marketing for ZFS. Is this fixed in an opensolaris build?



# pca -l a -p ZFS
Using /var/tmp/patchdiag.xref from Sep/11/09
Host: samhome1 (SunOS 5.10/Generic_141415-10/i386/i86pc)
List: a (2/88)

Patch  IR   CR RSB Age Synopsis
-- -- - -- --- ---  
---
141105 02 = 02 ---  58 SunOS 5.10_x86: ZFS Administration Java Web  
Console Patch

141909 03 = 03 R--  30 SunOS 5.10_x86: ZFS patch



# zpool status -v rpool
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress for 0h7m, 93.90% done, 0h0m to go
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t0d0s0  ONLINE   0 0 0
c0t1d0s0  ONLINE   0 0 0

errors: Permanent errors have been detected in the following files:

//dev/dsk/c0t0d0s0
//dev/dsk/c0t1d0s0


# zpool status -v rpool
  pool: rpool
 state: ONLINE
 scrub: scrub completed after 0h8m with 0 errors on Sun Sep 13  
17:22:47 2009

config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t0d0s0  ONLINE   0 0 0
c0t1d0s0  ONLINE   0 0 0

errors: No known data errors

Thanks.


On Jun 28, 2009, at 7:31 PM, Stuart Anderson wrote:

This is S10U7 fully patched and not open solaris, but I would  
appreciate any

advice on the following transient Permanent error message generated
while running a zpool scrub.



--
Stuart Anderson  ander...@ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Change the volblocksize of a ZFS volume

2009-08-30 Thread stuart anderson

   Question :
  
   Is there a way to change the volume blocksize
 say
  via 'zfs snapshot send/receive'?
  
   As I see things, this isn't possible as the
 target
  volume (including property values) gets
 overwritten
  by 'zfs receive'.
 
  
  By default, properties are not received.  To pass
  properties, you need 
  to use
  the -R flag.
 
 I have tried that, and while it works for properties
 like compression, I have not found a way to preserve
 a non-default volblocksize across zfs send | zfs
 receive. the zvol created on the receive side is
 always defaulting to 8k. Is there a way to do this?
 

I spoke too soon. More particularly, during the zfs send/recv processes the 
receiving side reports 8k, but once the receive is done the volblocksize is 
reporting the expected value as sent with -R.

Hopefully, this is just a reporting bug during an active receive.

Note, this was observed with s10u7 (x86).

Thanks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Change the volblocksize of a ZFS volume

2009-08-29 Thread stuart anderson

  Question :
 
  Is there a way to change the volume blocksize say
 via 'zfs snapshot send/receive'?
 
  As I see things, this isn't possible as the target
 volume (including property values) gets overwritten
 by 'zfs receive'.

 
 By default, properties are not received.  To pass
 properties, you need 
 to use
 the -R flag.

I have tried that, and while it works for properties like compression, I have 
not found a way to preserve a non-default volblocksize across zfs send | zfs 
receive. the zvol created on the receive side is always defaulting to 8k. Is 
there a way to do this?

Thanks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Transient permanent errors

2009-06-28 Thread Stuart Anderson

This is S10U7 fully patched and not open solaris, but I would  
appreciate any

advice on the following transient Permanent error message generated
while running a zpool scrub.

# zpool status -v rpool
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress for 0h8m, 57.22% done, 0h6m to go
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c2t1d0s0  ONLINE   0 0 0
c2t0d0s0  ONLINE   0 0 0

errors: Permanent errors have been detected in the following files:

//dev/dsk/c2t0d0s0
//dev/dsk/c2t1d0s0


Disconcerting that the actual pool devices are flagged as corrupt.
However, these are just symbolic links and when the scrub completed
the Permanent errors had disappeared:

# zpool status -v rpool
  pool: rpool
 state: ONLINE
 scrub: scrub completed after 0h19m with 0 errors on Sun Jun 28  
19:21:19 2009

config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c2t1d0s0  ONLINE   0 0 0
c2t0d0s0  ONLINE   0 0 0

errors: No known data errors


Thanks.



--
Stuart Anderson  ander...@ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Speeding up resilver on x4500

2009-06-23 Thread Stuart Anderson



On Jun 23, 2009, at 11:50 AM, Richard Elling wrote:


(2) is there some reasonable way to read in multiples of these  
blocks in a single IOP?   Theoretically, if the blocks are in  
chronological creation order, they should be (relatively)  
sequential on the drive(s).  Thus, ZFS should be able to read in  
several of them without forcing a random seek. That is, you should  
be able to get multiple blocks in a single IOP.


Metadata is prefetched. You can look at the hit rate in kstats.
Stuart, you might post the output of kstat -n vdev_cache_stats
I regularly see cache hit rates in the 60% range, which isn't bad
considering what is being cached.


# kstat -n vdev_cache_stats
module: zfs instance: 0
name:   vdev_cache_statsclass:misc
crtime  129.03798177
delegations 25873382
hits114064783
misses  182253696
snaptime960064.85352608


Here is also some zpool iostat numbers during this resilver,

# zpool iostat ldas-cit1 10
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
ldas-cit1   16.9T  3.49T165134  5.17M  1.58M
ldas-cit1   16.9T  3.49T225237  1.28M  1.98M
ldas-cit1   16.9T  3.49T288317  1.53M  2.26M
ldas-cit1   16.9T  3.49T174269  1014K  1.68M


And here is the pool configuration,

# zpool status ldas-cit1
  pool: ldas-cit1
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool  
will

continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 96h49m, 63.69% done, 55h12m to go
config:

NAMESTATE READ WRITE CKSUM
ldas-cit1   DEGRADED 0 0 0
  raidz2DEGRADED 0 0 0
c0t1d0  ONLINE   0 0 0
c1t1d0  ONLINE   0 0 0
c3t1d0  ONLINE   0 0 0
c4t1d0  ONLINE   0 0 0
c5t1d0  ONLINE   0 0 0
c6t1d0  ONLINE   0 0 0
c0t2d0  ONLINE   0 0 0
c1t2d0  ONLINE   0 0 0
c3t2d0  ONLINE   0 0 0
c4t2d0  ONLINE   0 0 0
c5t2d0  ONLINE   0 0 0
spare   DEGRADED 0 0 0
  replacing DEGRADED 0 0 0
c6t2d0s0/o  FAULTED  0 0 0  corrupted data
c6t2d0  ONLINE   0 0 0
  c6t0d0ONLINE   0 0 0
c0t3d0  ONLINE   0 0 0
c1t3d0  ONLINE   0 0 0
c3t3d0  ONLINE   0 0 0
  raidz2ONLINE   0 0 0
c4t3d0  ONLINE   0 0 0
c5t3d0  ONLINE   0 0 0
c6t3d0  ONLINE   0 0 0
c0t4d0  ONLINE   0 0 0
c1t4d0  ONLINE   0 0 0
c3t4d0  ONLINE   0 0 0
c5t0d0  ONLINE   0 0 0
c5t4d0  ONLINE   0 0 0
c6t4d0  ONLINE   0 0 0
c0t5d0  ONLINE   0 0 0
c1t5d0  ONLINE   0 0 0
c3t5d0  ONLINE   0 0 0
c4t5d0  ONLINE   0 0 0
c5t5d0  ONLINE   0 0 0
c6t5d0  ONLINE   0 0 0
  raidz2ONLINE   0 0 0
c0t6d0  ONLINE   0 0 0
c1t6d0  ONLINE   0 0 0
c3t6d0  ONLINE   0 0 0
c4t6d0  ONLINE   0 0 0
c5t6d0  ONLINE   0 0 0
c6t6d0  ONLINE   0 0 0
c0t7d0  ONLINE   0 0 0
c1t7d0  ONLINE   0 0 0
c3t7d0  ONLINE   0 0 0
c4t7d0  ONLINE   0 0 0
c5t7d0  ONLINE   0 0 0
c6t7d0  ONLINE   0 0 0
c0t0d0  ONLINE   0 0 0
c1t0d0  ONLINE   0 0 0
c3t0d0  ONLINE   0 0 0
spares
  c6t0d0INUSE currently in use

errors: No known data errors


--
Stuart Anderson  ander

Re: [zfs-discuss] Speeding up resilver on x4500

2009-06-22 Thread Stuart Anderson



On Jun 21, 2009, at 10:21 PM, Nicholas Lee wrote:




On Mon, Jun 22, 2009 at 4:24 PM, Stuart Anderson ander...@ligo.caltech.edu 
 wrote:


However, it is a bit disconcerting to have to run with reduced data
protection for an entire week. While I am certainly not going back to
UFS, it seems like it should be at least theoretically possible to  
do this

several orders of magnitude faster, e.g., what if every block on the
replacement disk had its RAIDZ2 data recomputed from the degraded

Maybe this is also saying - that for large disk sets a single RAIDZ2  
provides a false sense of security.


This configuration is with 3 large RAIDZ2 devices but I have more  
recently
been building thumper/thor systems with a larger number of smaller  
RAIDZ2's.


Thanks.

--
Stuart Anderson  ander...@ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Speeding up resilver on x4500

2009-06-21 Thread Stuart Anderson


It is currently taking ~1 week to resilver an x4500 running S10U6,
recently patched with~170M small files on ~170 datasets after a
disk failure/replacement, i.e.,

 scrub: resilver in progress for 53h47m, 30.72% done, 121h19m to go

Is there anything that can be tuned to improve this performance, e.g.,
adding a faster cache device for reading and/or writing?



I am also curious if anyone has a prediction on when the
snapshot-restarting-resilvering bug will be patched in Solaris 10?

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6343667


Thanks.

--
Stuart Anderson  ander...@ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Speeding up resilver on x4500

2009-06-21 Thread Stuart Anderson



On Jun 21, 2009, at 8:57 PM, Richard Elling wrote:


Stuart Anderson wrote:

It is currently taking ~1 week to resilver an x4500 running S10U6,
recently patched with~170M small files on ~170 datasets after a
disk failure/replacement, i.e.,


wow, that is impressive.  There is zero chance of doing that with a
manageable number of UFS file systems.


However, it is a bit disconcerting to have to run with reduced data
protection for an entire week. While I am certainly not going back to
UFS, it seems like it should be at least theoretically possible to do  
this

several orders of magnitude faster, e.g., what if every block on the
replacement disk had its RAIDZ2 data recomputed from the degraded
array regardless of whether the pool was using it or not. In that case
I would expect it to be able to sequentially reconstruct in the same few
hours it would take a HW RAID controller to do the same RAID6 job.

Perhaps there needs to be an option to re-order the loops for
resilvering on pools with lots of small files to resilver in device
order rather than filesystem order?





scrub: resilver in progress for 53h47m, 30.72% done, 121h19m to go

Is there anything that can be tuned to improve this performance,  
e.g.,

adding a faster cache device for reading and/or writing?


Resilver tends to be bound by one of two limits:

1. sequential write performance of the resilvering device

2. random I/O performance of the non-resilvering devices



A quick look at iostat leads me to conjecture that the vdev rebuilding  
is

taking a very low priority compared to ongoing application I/O (NFSD
in this case). Are there any ZFS knobs that control the relative  
priority of

resilvering to other disk I/O tasks?

Thanks.

--
Stuart Anderson  ander...@ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Confused by compressratio

2008-04-16 Thread Stuart Anderson

On Tue, Apr 15, 2008 at 03:51:17PM -0700, Richard Elling wrote:
 UTSL.  compressratio is the ratio of uncompressed bytes to compressed bytes.
 http://cvs.opensolaris.org/source/search?q=ZFS_PROP_COMPRESSRATIOdefs=refs=path=zfshist=project=%2Fonnv
 
 IMHO, you will (almost) never get the same number looking at bytes as you
 get from counting blocks.

If I can't use /bin/ls to get an accurate measure of the number of compressed
blocks used (-s) and the original number of uncompressed bytes (-l). What is
a more accurate way to measure these?

As a gedankan experiment, what command(s) can I run to examine a compressed
ZFS filesystem and determine how much space it will require to replicate
to an uncompressed ZFS filesystem? I can add up the file sizes, e.g.,
/bin/ls -lR | grep ^- | nawk '{SUM+=$5}END{print SUM}'
but I would have thought there was a more efficient way using the already
aggregated filesystem metadata via /bin/df or zfs list and the
compressratio.

Thanks.

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Confused by compressratio

2008-04-16 Thread Stuart Anderson

On Wed, Apr 16, 2008 at 10:09:00AM -0700, Richard Elling wrote:
Stuart Anderson wrote:
On Tue, Apr 15, 2008 at 03:51:17PM -0700, Richard Elling wrote:

UTSL. compressratio is the ratio of uncompressed bytes to compressed
bytes.
http://cvs.opensolaris.org/source/search?q=ZFS_PROP_COMPRESSRATIOdefs=refs=path=zfshist=project=%2Fonnv

IMHO, you will (almost) never get the same number looking at bytes as you
get from counting blocks.

If I can't use /bin/ls to get an accurate measure of the number of
compressed
blocks used (-s) and the original number of uncompressed bytes (-l). What
is
a more accurate way to measure these?

ls -s should give the proper number of blocks used.
ls -l should give the proper file length.
Do not assume that compressed data in a block consumes the whole block.

Not even on a pristine ZFS filesystem where just one file has been created?

As a gedankan experiment, what command(s) can I run to examine a compressed
ZFS filesystem and determine how much space it will require to replicate
to an uncompressed ZFS filesystem? I can add up the file sizes, e.g.,
/bin/ls -lR | grep ^- | nawk '{SUM+=$5}END{print SUM}'
but I would have thought there was a more efficient way using the already
aggregated filesystem metadata via /bin/df or zfs list and the
compressratio.

IMHO, this is a by-product of the dynamic nature of ZFS.

Are you saying it can't be done except by adding up all the individual
file sizes?

Personally, I'd estimate using du rather than ls.

They report the exact same number as far as I can tell. With the caveat
that Solaris ls -s returns the number of 512-byte blocks, whereas
GNU ls -s returns the number of 1024byte blocks by default.

Thanks.

--
Stuart Anderson [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Confused by compressratio

2008-04-16 Thread Stuart Anderson

On Wed, Apr 16, 2008 at 02:07:53PM -0700, Richard Elling wrote:
 
 Personally, I'd estimate using du rather than ls.
 
 
 They report the exact same number as far as I can tell. With the caveat
 that Solaris ls -s returns the number of 512-byte blocks, whereas
 GNU ls -s returns the number of 1024byte blocks by default.
 
   
 That is file-system dependent.  Some file systems have larger blocks
 and ls -s shows the size in blocks.  ZFS uses dynamic block sizes, but
 you knew that already... :-)
 -- richard
 

OK, we are now clearly exposing my ignorance, so hopefully I can learn
something new about ZFS.

What is the distinction/relationship between recordsize (which as
I understand is a fixed quantity for each ZFS dataset) and dynamic
block sizes?  Are blocks what are allocated for metadata, and records
what are allocated for data, i.e., the contents of files?

What does it mean that blocks are compressed for a ZFS dataset with
compression=off? Is this equivalent to saying that ZFS metadata is
always compressed?

Is there any ZFS documentation that shows by example exactly how to
interpret the the various numbers from ls, du, df, and zfs used/refernced/
available/compressratio in the context of compression={on,off}, possibly
also refering to both sparse and non-sparse files?

Thanks.


-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Confused by compressratio

2008-04-15 Thread Stuart Anderson

On Tue, Apr 15, 2008 at 01:37:43PM -0400, Luke Scharf wrote:
 
 zfs list /export/compress
 
   
 NAME  USED  AVAIL  REFER  MOUNTPOINT
 export-cit/compress  90.4M  1.17T  90.4M  /export/compress
 
 is 2GB/90.4M = 2048 / 90.4 = 22.65
 
 
 That still leaves me puzzled what the precise definition of compressratio 
 is?
 
 
 My guess is that the compressratio doesn't include any of those runs of 
 null characaters that weren't actually written to the disk.

This test was done with a file created with via /bin/yes | head, i.e.,
it does not have any null characters specifically for this possibility.

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Confused by compressratio

2008-04-14 Thread Stuart Anderson

On Mon, Apr 14, 2008 at 09:59:48AM -0400, Luke Scharf wrote:
 Stuart Anderson wrote:
 As an artificial test, I created a filesystem with compression enabled
 and ran mkfile 1g and the reported compressratio for that filesystem
 is 1.00x even though this 1GB file only uses only 1kB.
   
 
 ZFS seems to treat files filled with zeroes as sparse files, regardless 
 of whether or not compression is enabled.  Try dd if=/dev/urandom 
 of=1g.dat bs=1024 count=1048576 to create a file that won't exhibit 
 this behavior.  Creating this file is a lot slower than writing zeroes 
 (mostly due to the speed of the urandom device), but ZFS won't treat it 
 like a sparse file, and it won't compress very well either.

However, I am still trying to reconcile the compression ratio as
reported by compressratio vs the ratio of file sizes to disk blocks
used (whether or not ZFS is creating sparse files).

Regarding sparse files, I recently found that the builtin heuristic
for auto detecting and creating sparse files in the GNU cp program
works on ZFS filesystems. In particular, if you use GNU cp to copy
a file from ZFS and it has a string of null characters in it (whether
or not it is stored as a sparse file) the output file (regardless of
the destination filesystem type) will be a sparse file. I have not seen
this behavior for copying such files from other source filesystems.

Thanks.

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Confused by compressratio

2008-04-14 Thread Stuart Anderson

On Mon, Apr 14, 2008 at 05:22:03PM -0400, Luke Scharf wrote:
 Stuart Anderson wrote:
 On Mon, Apr 14, 2008 at 09:59:48AM -0400, Luke Scharf wrote:
   
 Stuart Anderson wrote:
 
 As an artificial test, I created a filesystem with compression enabled
 and ran mkfile 1g and the reported compressratio for that filesystem
 is 1.00x even though this 1GB file only uses only 1kB.
  
   
 ZFS seems to treat files filled with zeroes as sparse files, regardless 
 of whether or not compression is enabled.  Try dd if=/dev/urandom 
 of=1g.dat bs=1024 count=1048576 to create a file that won't exhibit 
 this behavior.  Creating this file is a lot slower than writing zeroes 
 (mostly due to the speed of the urandom device), but ZFS won't treat it 
 like a sparse file, and it won't compress very well either.
 
 
 However, I am still trying to reconcile the compression ratio as
 reported by compressratio vs the ratio of file sizes to disk blocks
 used (whether or not ZFS is creating sparse files).
   
 
 Can you describe the data you're storing a bit?  Any big disk images?
 

Understanding the mkfile case would be a start, but the initial filesystem
that started my confusion is one that has a number of ~50GByte mysql database
files as well as a number of application code files.

Here is another simple test to avoid any confusion/bugs related to NULL
character sequeneces being compressed to nothing versus being treated
as sparse files. In particular, a 2GByte file full of the output of
/bin/yes:

zfs create export-cit/compress
cd /export/compress
/bin/df -k .
Filesystemkbytesused   avail capacity  Mounted on
export-cit/compress  1704858624  55 1261199742 1%/export/compress
zfs get compression export-cit/compress
NAME PROPERTY VALUESOURCE
export-cit/compress  compression  on   inherited from export-cit
/bin/yes | head -1073741824  yes.dat
/bin/ls -ls yes.dat
185017 -rw-r--r--   1 root root 2147483648 Apr 14 15:31 yes.dat
/bin/df -k .
Filesystemkbytesused   avail capacity  Mounted on
export-cit/compress  1704858624   92563 1261107232 1%/export/compress
zfs get compressratio export-cit/compress
NAME PROPERTY   VALUESOURCE
export-cit/compress  compressratio  28.39x   -

So compressratio reports 28.39, but the ratio of file size to used disk for
the only regular file on this filesystem, i.e., excluding the initial 55kB
allocated for the empty filesystem is:

2147483648 / (185017 * 512) = 22.67


Calculated another way from zfs list for the entire filesystem:

zfs list /export/compress
NAME  USED  AVAIL  REFER  MOUNTPOINT
export-cit/compress  90.4M  1.17T  90.4M  /export/compress

is 2GB/90.4M = 2048 / 90.4 = 22.65


That still leaves me puzzled what the precise definition of compressratio is?


Thanks.

---
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Confused by compressratio

2008-04-11 Thread Stuart Anderson

I am confused by the numerical value of compressratio. I copied a
compressed ZFS filesystem that is 38.5G in size (zfs list USED and
REFER value) and reports a compressratio value of 2.52x to an
uncompressed ZFS filesystem and it expanded to 198G. So why is the
compressratio 2.52 rather than 198/38.5 = 5.14?

As an artificial test, I created a filesystem with compression enabled
and ran mkfile 1g and the reported compressratio for that filesystem
is 1.00x even though this 1GB file only uses only 1kB.

Note, this was done with ZFS version 4 on S10U4.

I would appreciate any help in understanding what compressratio means.

Thanks.

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] scrub performance

2008-03-06 Thread Stuart Anderson

I currently have an X4500 running S10U4 with the latest ZFS uber patch
(127729-07) for which zpool scrub is making very slow progress even
though the necessary resources are apparently available. Currently it has
been running for 3 days to reach 75% completion, however, in the last 12
hours this only advanced by 3%. At times this server is busy running NFSD
and it is understandable that the scrub to take a lower priority, however,
I have observed interestingly long time intervals when neither prstat
nor iostat show any obvious bottlenecks, e.g., disks at 10% busy.
Is there a throttle on scrub resource allocation that does not readily
open up again after being limited due to other system activity?

For comparison, an identical system (same OS/zpool config, and roughly
the same number of filesystems and files) finished a scrub in 2 days.

This is not a critical problem, but at least initially it was clear
from iostat that scrub was pegging all the disk IOPS/BW as available,
but I am curious why it has backed off from that after a few days
of running.

P.S. I realize it is not a user command and that the last event can be
found in zpool status, but I would find it convenient if the scrub
completion event was also logged in the zpool history along with the
initiation event.

Thanks.

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] scrub performance

2008-03-06 Thread Stuart Anderson

On Thu, Mar 06, 2008 at 11:51:00AM -0800, Stuart Anderson wrote:
 I currently have an X4500 running S10U4 with the latest ZFS uber patch
 (127729-07) for which zpool scrub is making very slow progress even
 though the necessary resources are apparently available. Currently it has

It is also interesting to note that this system is now making negative
progress. I can understand the remaining time estimate going up with
time, but what does it mean for the % complete number to go down after
6 hours of work?

Thanks.


# zpool status | egrep -e progress|errors ; date
 scrub: scrub in progress, 75.49% done, 28h51m to go
errors: No known data errors
Thu Mar  6 08:50:59 PST 2008

# zpool status | egrep -e progress|errors ; date
 scrub: scrub in progress, 75.24% done, 31h20m to go
errors: No known data errors
Thu Mar  6 15:15:39 PST 2008

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] scrub performance

2008-03-06 Thread Stuart Anderson

On Thu, Mar 06, 2008 at 05:55:53PM -0800, Marion Hakanson wrote:
 [EMAIL PROTECTED] said:
  It is also interesting to note that this system is now making negative
  progress. I can understand the remaining time estimate going up with time,
  but what does it mean for the % complete number to go down after 6 hours of
  work? 
 
 Sorry I don't have any helpful experience in this area.  It occurs to me
 that perhaps you are detecting a gravity wave of some sort -- Thumpers
 are pretty heavy, and thus may be more affected than the average server.
 Or the guys at SLAC have, unbeknownst to you, somehow accelerated your
 Thumper to near the speed of light.
 
 (:-)
 

If true, that would certainly help, since we actually are using these
thumpers to help detect gravitational waves! See, http://www.ligo.caltech.edu.

Thanks.



-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Kernel panic on arc_buf_remove_ref() assertion

2008-02-19 Thread Stuart Anderson

In this particular case will 127729-07 contain all the bug fixes in
IDR127787-12 (or later?). I have also run into a few other kernel
panics addressed in earlier revisions of this IDR but I am eager
to get back on the main Sol10 branch.

Thanks.

On Mon, Feb 18, 2008 at 08:45:46PM -0800, Prabahar Jeyaram wrote:
 Any IDRXX (Released immediately) is the interim relief (Will also 
 contains the fix) provided to the customers till the official patch 
 (Usually takes longer to be released) is available. Patch is supposed to 
 be consider as the permanent solution.
 
 --
 Prabahar.
 
 Stuart Anderson wrote:
 Thanks for the information.
 
 How does the temporary patch 127729-07 relate to the IDR127787 (x86) which
 I believe also claims to fix this panic?
 

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Kernel panic on arc_buf_remove_ref() assertion

2008-02-18 Thread Stuart Anderson

Is this kernel panic a known ZFS bug, or should I open a new ticket?

Note, this happened on an X4500 running S10U4 (127112-06) with NCQ disabled.

Thanks.


Feb 18 17:55:18 thumper1 ^Mpanic[cpu1]/thread=fe8000809c80: 
Feb 18 17:55:18 thumper1 genunix: [ID 403854 kern.notice] assertion failed: 
arc_buf_remove_ref(db-db_buf, db) == 0, file: ../../common/fs/zfs/dbuf.c, 
line: 1692
Feb 18 17:55:18 thumper1 unix: [ID 10 kern.notice] 
Feb 18 17:55:18 thumper1 genunix: [ID 802836 kern.notice] fe80008099d0 
fb9c9853 ()
Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809a00 
zfs:zfsctl_ops_root+2fac59f2 ()
Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809a30 
zfs:dbuf_write_done+c8 ()
Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809a70 
zfs:arc_write_done+13b ()
Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809ac0 
zfs:zio_done+1b8 ()
Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809ad0 
zfs:zio_next_stage+65 ()
Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809b00 
zfs:zio_wait_for_children+49 ()
Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809b10 
zfs:zio_wait_children_done+15 ()
Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809b20 
zfs:zio_next_stage+65 ()
Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809b60 
zfs:zio_vdev_io_assess+84 ()
Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809b70 
zfs:zio_next_stage+65 ()
Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809bd0 
zfs:vdev_mirror_io_done+c1 ()
Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809be0 
zfs:zio_vdev_io_done+14 ()
Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809c60 
genunix:taskq_thread+bc ()
Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809c70 
unix:thread_start+8 ()
Feb 18 17:55:18 thumper1 unix: [ID 10 kern.notice] 

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Kernel panic on arc_buf_remove_ref() assertion

2008-02-18 Thread Stuart Anderson

On Mon, Feb 18, 2008 at 06:28:31PM -0800, Stuart Anderson wrote:
 Is this kernel panic a known ZFS bug, or should I open a new ticket?
 
 Feb 18 17:55:18 thumper1 genunix: [ID 403854 kern.notice] assertion failed: 
 arc_buf_remove_ref(db-db_buf, db) == 0, file: ../../common/fs/zfs/dbuf.c, 
 line: 1692

It looks like this might be bug 6523336,
http://sunsolve.sun.com/search/document.do?assetkey=1-66-201229-1

Does anyone know when the Binary relief for this and other Sol10 ZFS
kernel panics will be released as normal kernel patches?

Thanks.

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Kernel panic on arc_buf_remove_ref() assertion

2008-02-18 Thread Stuart Anderson

Thanks for the information.

How does the temporary patch 127729-07 relate to the IDR127787 (x86) which
I believe also claims to fix this panic?

Thanks.


On Mon, Feb 18, 2008 at 08:32:03PM -0800, Prabahar Jeyaram wrote:
 The patches (127728-06 : sparc, 127729-07 : x86) which has the fix for 
 this panic is in temporary state and will be released via SunSolve soon.
 
 Please contact your support channel to get these patches.
 
 --
 Prabahar.
 
 Stuart Anderson wrote:
 On Mon, Feb 18, 2008 at 06:28:31PM -0800, Stuart Anderson wrote:
 Is this kernel panic a known ZFS bug, or should I open a new ticket?
 
 Feb 18 17:55:18 thumper1 genunix: [ID 403854 kern.notice] assertion 
 failed: arc_buf_remove_ref(db-db_buf, db) == 0, file: 
 ../../common/fs/zfs/dbuf.c, line: 1692
 
 It looks like this might be bug 6523336,
 http://sunsolve.sun.com/search/document.do?assetkey=1-66-201229-1
 
 Does anyone know when the Binary relief for this and other Sol10 ZFS
 kernel panics will be released as normal kernel patches?
 
 Thanks.
 

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Parallel zfs destroy results in No more processes

2007-10-26 Thread Stuart Anderson

 Do you have sata Native Command Queuing enabled?  I've experienced delays of 
 just under one minute when NCQ is enabled, that do not occur when NCQ is 
 disabled.  If all threads comprising the parallel zfs destroy hang for a 
 minute, I bet  its the hang that causes no more processes.  I have opened a 
 trouble ticket on this issue, and am waiting for feedback.  In the mean time, 
 I've disabled NCQ by adding the line below to /etc/system (and rebooting).
 
set sata:sata_func_enable = 0x5

Not on this system. It is not clear to me how these timeout/disconnected
problems would cause a call to fork() to fail but I can give that a try
the next time I need to delete that much data.

However, we have disabled NCQ through this mechanism on another system that
was locking up ~1/week with several device disconnected messgaes. That
system has been up for 2 weeks after disabling NCQ and has not displayed
any disconnected messages since then.

Can anyone confirm that that 125205-07 has solved these NCQ problems?

Thanks.

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] X4500 device disconnect problem persists

2007-10-26 Thread Stuart Anderson

After applying 125205-07 on two X4500 machines running Sol10U4 and
removing set sata:sata_func_enable = 0x5 from /etc/system to
re-enable NCQ, I am again observing drive disconnect error messages.
This in spite of the patch description which claims multiple fixes
in this area:

6587133 repeated DMA command timeouts and device resets on x4500
6538627 x4500 message logs contain multiple device disk resets but nothing 
logged in FMA
6564956 Disparity error for marvell88sx3 was shown during boot-time
for example,

Has anyone else had any better luck with this?


Thanks.


Oct 26 16:25:34 thumper2 marvell88sx: [ID 670675 kern.info] NOTICE: 
marvell88sx3: device on port 1 reset: no matching NCQ I/O found
Oct 26 16:25:34 thumper2 marvell88sx: [ID 670675 kern.info] NOTICE: 
marvell88sx3: device on port 1 reset: device disconnected or device error
Oct 26 16:25:34 thumper2 sata: [ID 801593 kern.notice] NOTICE: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Oct 26 16:25:34 thumper2  port 1: device reset
Oct 26 16:25:34 thumper2 sata: [ID 801593 kern.notice] NOTICE: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Oct 26 16:25:34 thumper2  port 1: link lost
Oct 26 16:25:34 thumper2 sata: [ID 801593 kern.notice] NOTICE: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Oct 26 16:25:34 thumper2  port 1: link established
Oct 26 16:25:34 thumper2 marvell88sx: [ID 812950 kern.warning] WARNING: 
marvell88sx3: error on port 1:
Oct 26 16:25:34 thumper2 marvell88sx: [ID 517869 kern.info] device 
disconnected
Oct 26 16:25:34 thumper2 marvell88sx: [ID 517869 kern.info] device connected
Oct 26 16:25:34 thumper2 scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 (sd25):
Oct 26 16:25:34 thumper2Error for Command: read(10)
Error Level: Retryable
Oct 26 16:25:34 thumper2 scsi: [ID 107833 kern.notice]  Requested Block: 
521002402 Error Block: 521002402
Oct 26 16:25:34 thumper2 scsi: [ID 107833 kern.notice]  Vendor: ATA 
   Serial Number: 
Oct 26 16:25:34 thumper2 scsi: [ID 107833 kern.notice]  Sense Key: No 
Additional Sense
Oct 26 16:25:34 thumper2 scsi: [ID 107833 kern.notice]  ASC: 0x0 (no additional 
sense info), ASCQ: 0x0, FRU: 0x0

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Parallel zfs destroy results in No more processes

2007-10-24 Thread Stuart Anderson

On Wed, Oct 24, 2007 at 10:40:41AM -0700, David Bustos wrote:
 Quoth Stuart Anderson on Sun, Oct 21, 2007 at 07:09:10PM -0700:
  Running 102 parallel zfs destroy -r commands on an X4500 running S10U4 has
  resulted in No more processes errors in existing login shells for several
  minutes of time, but then fork() calls started working again.  However, none
  of the zfs destroy processes have actually completed yet, which is odd since
  some of the filesystems are trivially small.
 ...
  Is this a known issue?  Any ideas on what resource lots of zfs commands use
  up to prevent fork() from working?
 
 ZFS is known to use a lot of memory.  I suspect this problem has
 diminished in recent Nevada builds.  Can you try this on Nevada?

I suspect it is more subtle than this since top was reporting that
none of the available swap space was being used yet, so there was 16GB
of free VM.

Unfortunately, I am not currently in a position to switch this system
over to Nevada.

Thanks.

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Parallel zfs destroy results in No more processes

2007-10-21 Thread Stuart Anderson

Running 102 parallel zfs destroy -r commands on an X4500 running S10U4 has
resulted in No more processes errors in existing login shells for several
minutes of time, but then fork() calls started working again.  However, none
of the zfs destroy processes have actually completed yet, which is odd since
some of the filesystems are trivially small.

After fork() started working there where hardly any other processes than
the 102 zfs destroy running on the system, i.e.,
# ps -ef | wc -l
154

Here is a snapshot of top that looks resonable, note especially that
free swap is 16GB and that the last pid is still in the range of
the ~100 zfs commands being run.

Is this a known issue?  Any ideas on what resource lots of zfs commands use
up to prevent fork() from working?

Thanks.


last pid: 11473;  load avg:  0.35,  0.87,  0.68;   up 9+00:21:42  
18:56:38
148 processes: 146 sleeping, 1 zombie, 1 on cpu
CPU states: 94.2% idle,  0.0% user,  5.8% kernel,  0.0% iowait,  0.0% swap
Memory: 16G phys mem, 1029M free mem, 16G total swap, 16G free swap

   PID USERNAME LWP PRI NICE  SIZE   RES STATETIMECPU COMMAND
 11333 root   1  590 3188K  772K cpu/30:01  0.02% top
   622 noaccess  28  590  172M 4528K sleep4:28  0.01% java
   528 root   1  590   20M 5092K sleep2:44  0.01% Xorg
   431 root  11  590 5620K 1248K sleep0:01  0.01% syslogd
   565 root   1  590   10M 1384K sleep0:53  0.00% dtgreet
   206 root   1 100  -20 2068K 1128K sleep0:21  0.00% xntpd
 10864 root   1  590 7416K 1216K sleep0:00  0.00% sshd
 7 root  14  590   12M  680K sleep0:05  0.00% svc.startd
   158 root  33  590 6864K 1616K sleep0:15  0.00% nscd
   312 root   1  590 1112K  660K sleep0:00  0.00% utmpd
   340 root   3  590 3932K 1312K sleep0:00  0.00% inetd
   582 root  22  590   17M 2028K sleep5:49  0.00% fmd
 11432 root   1  590 4556K 1496K sleep0:30  0.00% zfs
 11449 root   1  590 4556K 1496K sleep0:27  0.00% zfs
 11360 root   1  590 4552K 1492K sleep0:26  0.00% zfs

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] chgrp -R hangs all writes to pool

2007-10-04 Thread Stuart Anderson

On Mon, Jul 16, 2007 at 09:36:06PM -0700, Stuart Anderson wrote:
 Running Solaris 10 Update 3 on an X4500 I have found that it is possible
 to reproducibly block all writes to a ZFS pool by running chgrp -R
 on any large filesystem in that pool.  As can be seen below in the zpool
 iostat output below, after about 10-sec of running the chgrp command all
 writes to the pool stop, and the pool starts exclusively running a slow
 background task of 1kB reads.
 
 At this point the chgrp -R command is not killable via root kill -9,
 and in fact even the command halt -d does not do anything.
 

For posterity this appears to have been fixed in S10U4, at least I am
unable to reproduce the problem that was easy to trigger with S10U3.

Thanks.

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool degraded status after resilver completed

2007-09-08 Thread Stuart Anderson

Possibly related is the fact that fmd is now in a CPU spin loop constantly
checking the time, even tough there are no reported faults, i.e.,

# fmdump -v
TIME UUID SUNW-MSG-ID
fmdump: /var/fm/fmd/fltlog is empty

# svcs fmd
STATE  STIMEFMRI
online 13:11:43 svc:/system/fmd:default

# prstat
   PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME  CPU PROCESS/NLWP   
   422 root   17M   13M run 110  20:42:51  19% fmd/22


# truss -p 422 | head -20
/13:time()  = 1189279453
/13:time()  = 1189279453
/13:time()  = 1189279453
/13:time()  = 1189279453
/13:time()  = 1189279453
/13:time()  = 1189279453
/13:time()  = 1189279453
/13:time()  = 1189279453
/13:time()  = 1189279453
/13:time()  = 1189279453
/13:lwp_park(0xFDB7DF40, 0) Err#62 ETIME
/13:time()  = 1189279453
/13:time()  = 1189279453
/13:time()  = 1189279453
/13:time()  = 1189279453
/13:time()  = 1189279453
/13:time()  = 1189279453
/13:time()  = 1189279453
/13:time()  = 1189279453
/13:time()  = 1189279453

Is this a known bug with fmd and ZFS?

Thanks.


On Fri, Sep 07, 2007 at 08:55:52PM -0700, Stuart Anderson wrote:
 I am curious why zpool status reports a pool to be in the DEGRADED state
 after a drive in a raidz2 vdev has been successfully replaced. In this
 particular case drive c0t6d0 was failing so I ran,
 
 zpool offline home/c0t6d0
 zpool replace home c0t6d0 c8t1d0
 
 and after the resilvering finished the pool reports a degraded state.
 Hopefully this is incorrect. At this point is the vdev in question
 now has full raidz2 protected even though it is listed as DEGRADED?
 

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zpool degraded status after resilver completed

2007-09-07 Thread Stuart Anderson

I am curious why zpool status reports a pool to be in the DEGRADED state
after a drive in a raidz2 vdev has been successfully replaced. In this
particular case drive c0t6d0 was failing so I ran,

zpool offline home/c0t6d0
zpool replace home c0t6d0 c8t1d0

and after the resilvering finished the pool reports a degraded state.
Hopefully this is incorrect. At this point is the vdev in question
now has full raidz2 protected even though it is listed as DEGRADED?

P.S. This is on a pool created on S10U3 and upgraded to ZFS version 4
after upgrading the host to S10U4.

Thanks.


# zpool status
  pool: home
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
 scrub: resilver completed with 0 errors on Fri Sep  7 18:39:03 2007
config:

NAME  STATE READ WRITE CKSUM
home  DEGRADED 0 0 0
  raidz2  ONLINE   0 0 0
c0t0d0ONLINE   0 0 0
c1t0d0ONLINE   0 0 0
c5t0d0ONLINE   0 0 0
c7t0d0ONLINE   0 0 0
c8t0d0ONLINE   0 0 0
c0t1d0ONLINE   0 0 0
c1t1d0ONLINE   0 0 0
c5t1d0ONLINE   0 0 0
c6t1d0ONLINE   0 0 0
c7t1d0ONLINE   0 0 0
c0t2d0ONLINE   0 0 0
  raidz2  ONLINE   0 0 0
c1t2d0ONLINE   0 0 0
c5t2d0ONLINE   0 0 0
c6t2d0ONLINE   0 0 0
c7t2d0ONLINE   0 0 0
c8t2d0ONLINE   0 0 0
c0t3d0ONLINE   0 0 0
c1t3d0ONLINE   0 0 0
c5t3d0ONLINE   0 0 0
c6t3d0ONLINE   0 0 0
c7t3d0ONLINE   0 0 0
c8t3d0ONLINE   0 0 0
  raidz2  ONLINE   0 0 0
c0t4d0ONLINE   0 0 0
c1t4d0ONLINE   0 0 0
c5t4d0ONLINE   0 0 0
c7t4d0ONLINE   0 0 0
c8t4d0ONLINE   0 0 0
c0t5d0ONLINE   0 0 0
c1t5d0ONLINE   0 0 0
c5t5d0ONLINE   0 0 0
c6t5d0ONLINE   0 0 0
c7t5d0ONLINE   0 0 0
c8t5d0ONLINE   0 0 0
  raidz2  DEGRADED 0 0 0
spare DEGRADED 0 0 0
  c0t6d0  OFFLINE  0 0 0
  c8t1d0  ONLINE   0 0 0
c1t6d0ONLINE   0 0 0
c5t6d0ONLINE   0 0 0
c6t6d0ONLINE   0 0 0
c7t6d0ONLINE   0 0 0
c8t6d0ONLINE   0 0 0
c0t7d0ONLINE   0 0 0
c1t7d0ONLINE   0 0 0
c5t7d0ONLINE   0 0 0
c6t7d0ONLINE   0 0 0
c7t7d0ONLINE   0 0 0
c8t7d0ONLINE   0 0 0
spares
  c8t1d0  INUSE currently in use

errors: No known data errors


-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Kernel panic receiving incremental snapshots

2007-08-25 Thread Stuart Anderson

Before I open a new case with Sun, I am wondering if anyone has seen this
kernel panic before? It happened on an X4500 running Sol10U3 while it was
receiving incremental snapshot updates.

Thanks.


Aug 25 17:01:50 ldasdata6 ^Mpanic[cpu0]/thread=fe857d53f7a0: 
Aug 25 17:01:50 ldasdata6 genunix: [ID 895785 kern.notice] dangling dbufs 
(dn=fe82a3532d10, dbuf=fe8b4e338b90)
Aug 25 17:01:50 ldasdata6 unix: [ID 10 kern.notice] 
Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112a80 
zfs:zfsctl_ops_root+2fa59a42 ()
Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112ac0 
zfs:dmu_objset_evict_dbufs+e2 ()
Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112af0 
zfs:dmu_objset_evict+30 ()
Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112b10 
zfs:zfsctl_ops_root+2fa5c0e1 ()
Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112b30 
zfs:dbuf_evict_user+44 ()
Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112b60 
zfs:zfsctl_ops_root+2fa4de31 ()
Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112b90 
zfs:dsl_dataset_close+56 ()
Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112bb0 
zfs:dmu_objset_close+1d ()
Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112d20 
zfs:dmu_recvbackup+5b5 ()
Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112d40 
zfs:zfs_ioc_recvbackup+45 ()
Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112d80 
zfs:zfsdev_ioctl+146 ()
Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112d90 
genunix:cdev_ioctl+1d ()
Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112db0 
specfs:spec_ioctl+50 ()
Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112de0 
genunix:fop_ioctl+25 ()
Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112ec0 
genunix:ioctl+ac ()
Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112f10 
unix:sys_syscall32+101 ()
Aug 25 17:01:50 ldasdata6 unix: [ID 10 kern.notice] 
Aug 25 17:01:50 ldasdata6 genunix: [ID 672855 kern.notice] syncing file 
systems...
Aug 25 17:01:51 ldasdata6 genunix: [ID 904073 kern.notice]  done
Aug 25 17:01:52 ldasdata6 genunix: [ID 111219 kern.notice] dumping to 
/dev/md/dsk/d3, offset 1645084672, content: kernel


-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] chgrp -R hangs all writes to pool

2007-07-17 Thread Stuart Anderson

On Tue, Jul 17, 2007 at 03:08:44PM +1000, James C. McPherson wrote:
 Log a new case with Sun, and make sure you supply
 a crash dump so people who know ZFS can analyze
 the issue.
 
 You can use stop-A sync, break sync, or
 
 reboot -dq
 

That does appear to have caused a panic/kernel dump. However, I cannot
find the dump image after rebooting to Solaris even thought savecore
appears to be configured,

# reboot -dq
Jul 17 12:27:35 x4500gc reboot: rebooted by root

panic[cpu2]/thread=9823c460: forced crash dump initiated at user request

fe8000e18d60 genunix:kadmin+4b4 ()
fe8000e18ec0 genunix:uadmin+93 ()
fe8000e18f10 unix:sys_syscall32+101 ()

syncing file systems... 1 1 done
dumping to /dev/md/dsk/d2, offset 3436511232, content: kernel
100% done: 3268790 pages dumped, compression ratio 12.39, dump succeeded
rebooting...


# dumpadm
  Dump content: kernel pages
   Dump device: /dev/md/dsk/d2 (swap)
Savecore directory: /var/crash/x4500gc
  Savecore enabled: yes

# ls -laR /var/crash/x4500gc/
/var/crash/x4500gc/:
total 2
drwx--  2 root root 512 Jul 12 16:26 .
drwxr-xr-x  3 root root 512 Jul 12 16:26 ..


Thanks.


-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] chgrp -R hangs all writes to pool

2007-07-17 Thread Stuart Anderson

It looks like there is a problem dumping a kernel panic on an X4500.
During the self induced panic, there where additional syslog messages
that indicate a problem writing to the two disks that make up
/dev/md/dsk/d2 in my case.  It is as if the SATA controllers are being
reset during the crash dump.

At any rate I will send this all to Sun support.

Thanks.


Jul 17 12:27:35 x4500gc unix: [ID 836849 kern.notice] 
Jul 17 12:27:35 x4500gc ^Mpanic[cpu2]/thread=9823c460: 
Jul 17 12:27:35 x4500gc genunix: [ID 156897 kern.notice] forced crash dump 
initiated at user request
Jul 17 12:27:35 x4500gc unix: [ID 10 kern.notice] 
Jul 17 12:27:35 x4500gc genunix: [ID 655072 kern.notice] fe8000e18d60 
genunix:kadmin+4b4 ()
Jul 17 12:27:35 x4500gc genunix: [ID 655072 kern.notice] fe8000e18ec0 
genunix:uadmin+93 ()
Jul 17 12:27:35 x4500gc genunix: [ID 655072 kern.notice] fe8000e18f10 
unix:sys_syscall32+101 ()
Jul 17 12:27:35 x4500gc unix: [ID 10 kern.notice] 
Jul 17 12:27:35 x4500gc genunix: [ID 672855 kern.notice] syncing file systems...
Jul 17 12:27:35 x4500gc genunix: [ID 733762 kern.notice]  1
Jul 17 12:27:37 x4500gc last message repeated 1 time
Jul 17 12:27:38 x4500gc genunix: [ID 904073 kern.notice]  done
Jul 17 12:27:39 x4500gc genunix: [ID 111219 kern.notice] dumping to 
/dev/md/dsk/d2, offset 3436511232, content: kernel
Jul 17 12:27:39 x4500gc marvell88sx: [ID 812950 kern.warning] WARNING: 
marvell88sx3: error on port 0:
Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info]  device 
disconnected
Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info]  device connected
Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info]  SError interrupt
Jul 17 12:27:39 x4500gc marvell88sx: [ID 131198 kern.info]  SErrors:
Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info]  
Recovered communication error
Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info]  PHY 
ready change
Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info]  10-bit 
to 8-bit decode error
Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info]  
Disparity error
Jul 17 12:27:39 x4500gc marvell88sx: [ID 812950 kern.warning] WARNING: 
marvell88sx3: error on port 4:
Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info]  device 
disconnected
Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info]  device connected
Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info]  SError interrupt
Jul 17 12:27:39 x4500gc marvell88sx: [ID 131198 kern.info]  SErrors:
Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info]  
Recovered communication error
Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info]  PHY 
ready change
Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info]  10-bit 
to 8-bit decode error
Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info]  
Disparity error
Jul 17 12:28:39 x4500gc genunix: [ID 409368 kern.notice] ^M100% done: 3268790 
pages dumped, compression ratio 12.39, 
Jul 17 12:28:39 x4500gc genunix: [ID 851671 kern.notice] dump succeeded
Jul 17 12:30:38 x4500gc genunix: [ID 540533 kern.notice] ^MSunOS Release 5.10 
Version Generic_125101-10 64-bit
Jul 17 12:30:38 x4500gc genunix: [ID 943907 kern.notice] Copyright 1983-2007 
Sun Microsystems, Inc.  All rights reserved.




On Tue, Jul 17, 2007 at 12:40:16PM -0700, Stuart Anderson wrote:
 On Tue, Jul 17, 2007 at 03:08:44PM +1000, James C. McPherson wrote:
  Log a new case with Sun, and make sure you supply
  a crash dump so people who know ZFS can analyze
  the issue.
  
  You can use stop-A sync, break sync, or
  
  reboot -dq
  
 
 That does appear to have caused a panic/kernel dump. However, I cannot
 find the dump image after rebooting to Solaris even thought savecore
 appears to be configured,
 
 # reboot -dq
 Jul 17 12:27:35 x4500gc reboot: rebooted by root
 
 panic[cpu2]/thread=9823c460: forced crash dump initiated at user 
 request
 
 fe8000e18d60 genunix:kadmin+4b4 ()
 fe8000e18ec0 genunix:uadmin+93 ()
 fe8000e18f10 unix:sys_syscall32+101 ()
 
 syncing file systems... 1 1 done
 dumping to /dev/md/dsk/d2, offset 3436511232, content: kernel
 100% done: 3268790 pages dumped, compression ratio 12.39, dump succeeded
 rebooting...
 
 
 # dumpadm
   Dump content: kernel pages
Dump device: /dev/md/dsk/d2 (swap)
 Savecore directory: /var/crash/x4500gc
   Savecore enabled: yes
 
 # ls -laR /var/crash/x4500gc/
 /var/crash/x4500gc/:
 total 2
 drwx--  2 root root 512 Jul 12 16:26 .
 drwxr-xr-x  3 root root 512 Jul 12 16:26 ..
 
 
 Thanks.
 
 
 -- 
 Stuart Anderson  [EMAIL PROTECTED]
 http://www.ligo.caltech.edu/~anderson

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] chgrp -R hangs all writes to pool

2007-07-16 Thread Stuart Anderson

 in the output of dmesg, svcs -xv, or fmdump associated
with this event.

Is this a known issue or should I open a new case with Sun?


Thanks.


-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] chgrp -R hangs all writes to pool

2007-07-16 Thread Stuart Anderson

On Tue, Jul 17, 2007 at 02:49:08PM +1000, James C. McPherson wrote:
 Stuart Anderson wrote:
 Running Solaris 10 Update 3 on an X4500 I have found that it is possible
 to reproducibly block all writes to a ZFS pool by running chgrp -R
 on any large filesystem in that pool.  As can be seen below in the zpool
 iostat output below, after about 10-sec of running the chgrp command all
 writes to the pool stop, and the pool starts exclusively running a slow
 background task of 1kB reads.
 

...

 
 Is this a known issue or should I open a new case with Sun?
 
 Log a new case with Sun, and make sure you supply
 a crash dump so people who know ZFS can analyze
 the issue.
 
 You can use stop-A sync, break sync, or
 
 reboot -dq
 

In previous attempts, neither halt -d nor reboot (with no arguments)
where able to shutdown the machine. Is reboot -dq really a bigger hammer
than halt -d?

Sorry to be pedantic, but what is the exact key sequence on a Sun
USB keyboard one should use to force a kernel dump on Solx86?
Since there is no OBP on an X4500 where do I type the sync command?

Thanks.

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

53 matches

Mail list logo