Re: [zfs-discuss] Problem booting after zfs upgrade
On Aug 5, 2011, at 8:55 PM, Edward Ned Harvey wrote: In any event... You need to do something like this: installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t0d0s0 (substitute whatever device slice you have used for rpool) That did the trick, thanks. Out of curiosity, does anyone know at what version you get a warning, and at what version installgrub is run automatically after upgrading a root pool/filesystem? -- Stuart Anderson ander...@ligo.caltech.edu http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Problem booting after zfs upgrade
After upgrading to zpool version 29/zfs version 5 on a S10 test system via the kernel patch 144501-19 it will now boot only as far as the to the grub menu. What is a good Solaris rescue image that I can boot that will allow me to import this rpool to look at it (given the newer version)? Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Validating a zfs send object
How do you verify that a zfs send binary object is valid? I tried running a truncated file through zstreamdump and it completed with no error messages and an exit() status of 0. However, I noticed it was missing a final print statement with a checksum value, END checksum = ... Is there any normal circumstance under which this END checksum statement will be missing? More usefully is there an option to zstreamdump, or a similar program, to validate validate an internal checksum value stored in a zfs send binary object? Or is the only way to do this with zfs receive? Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Partitioning ARC
On Jan 30, 2011, at 6:03 PM, Richard Elling wrote: On Jan 30, 2011, at 5:01 PM, Stuart Anderson wrote: On Jan 30, 2011, at 2:29 PM, Richard Elling wrote: On Jan 30, 2011, at 12:21 PM, stuart anderson wrote: Is it possible to partition the global setting for the maximum ARC size with finer grained controls? Ideally, I would like to do this on a per zvol basis but a setting per zpool would be interesting as well? While perhaps not perfect, see the primarycache and secondarycache properties of the zvol or file system. With primarycache I can turn off utilization of the ARC for some zvol's, but instead I was hoping to use the ARC but limit the maximum amount on a per zvol basis. Just a practical question, do you think the average storage admin will have any clue as to how to use this tunable? Yes. I think the basic idea of partitioning a memory cache over different storage objects is a straightforward concept. How could we be more effective in communicating the features and pitfalls of resource management at this level? Document that this is normally handled dynamically based on the default policy that all storage objects should be assigned ARC space on a fair share basis. However, if different quality of service is required for different storage objects this may be adjusted as follows... The use case is to prioritize which zvol devices should be fully cached in DRAM on a server that cannot fit them all in memory. It is not clear to me that this will make sense in a world of snapshots and dedup. Could you explain your requirements in more detail? I am using zvol's to hold the metadata for another filesystem (SAM-QFS). In some circumstances I can fit enough of this into the ARC that virtually all metadata reads IOPS happen at DRAM performance rather than SSD or slower. However, with a single server hosting multiple filesystem (hence multiple zvols) I would like to be able to prioritize the use of the ARC. I think there is merit to this idea. It can be especially useful in the zone context. Please gather your thoughts and file an RFE at www.illumos.org Not sure how to file an illumos RFE, but one simple model to think about would is a 2 tiered system where by default ZFS datasets use the ARC is currently the case, with no (to the best of my knowledge) relative priority, but some objects could optionally specific a request for a minimum size, e.g., add a companion attribute to primarycache named primarycachesize. This would represent the minimum amount of ARC space that is available for that object. Some thought would have to be given as to how to indicate if the sum of all primarycachesize settings is greater than zfs_arc_max, and document what happens in this case, e.g., all values ignored? Presumably something similar could also be considered for secondarycache. Thanks. -- Stuart Anderson ander...@ligo.caltech.edu http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Query zfs send objects
On Jan 29, 2011, at 10:00 PM, Richard Elling wrote: On Jan 29, 2011, at 5:48 PM, stuart anderson wrote: Is there a simple way to query zfs send binary objects for basic information such as: 1) What snapshot they represent? 2) When they where created? 3) Whether they are the result of an incremental send? 4) What the the baseline snapshot was, if applicable? 5) What ZFS version number they where made from? 6) Anything else that would be useful to keep them around as backup binary blobs on an archival system, e.g., SAM-QFS? zstreamdump has a -v option which will show header information. The structure of that documentation is only shown in the source, though. Thanks for the pointer. This has most of the information I am looking for. Do you know how to get zstreamdump to display whether it is parsing an incremental dump, and if so, what snapshot it is relative to? Put another way, given 2 zfs send binary blobs, can I determine if they are related without trying to restore them to a ZFS filesystem? Thanks. -- Stuart Anderson ander...@ligo.caltech.edu http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Partitioning ARC
Is it possible to partition the global setting for the maximum ARC size with finer grained controls? Ideally, I would like to do this on a per zvol basis but a setting per zpool would be interesting as well? The use case is to prioritize which zvol devices should be fully cached in DRAM on a server that cannot fit them all in memory. Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Query zfs send objects
On Jan 30, 2011, at 1:49 PM, Richard Elling wrote: On Jan 30, 2011, at 11:19 AM, Stuart Anderson wrote: On Jan 29, 2011, at 10:00 PM, Richard Elling wrote: On Jan 29, 2011, at 5:48 PM, stuart anderson wrote: Is there a simple way to query zfs send binary objects for basic information such as: 1) What snapshot they represent? 2) When they where created? 3) Whether they are the result of an incremental send? 4) What the the baseline snapshot was, if applicable? 5) What ZFS version number they where made from? 6) Anything else that would be useful to keep them around as backup binary blobs on an archival system, e.g., SAM-QFS? zstreamdump has a -v option which will show header information. The structure of that documentation is only shown in the source, though. Thanks for the pointer. This has most of the information I am looking for. Do you know how to get zstreamdump to display whether it is parsing an incremental dump, and if so, what snapshot it is relative to? Put another way, given 2 zfs send binary blobs, can I determine if they are related without trying to restore them to a ZFS filesystem? Each incremental send stream has a from and a to Global Unique Identifier (GUID) for the snapshots. As send stream with many incremental snapshots will have many of these pairs. Got it. That works. Thanks. -- Stuart Anderson ander...@ligo.caltech.edu http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Partitioning ARC
On Jan 30, 2011, at 2:29 PM, Richard Elling wrote: On Jan 30, 2011, at 12:21 PM, stuart anderson wrote: Is it possible to partition the global setting for the maximum ARC size with finer grained controls? Ideally, I would like to do this on a per zvol basis but a setting per zpool would be interesting as well? While perhaps not perfect, see the primarycache and secondarycache properties of the zvol or file system. With primarycache I can turn off utilization of the ARC for some zvol's, but instead I was hoping to use the ARC but limit the maximum amount on a per zvol basis. The use case is to prioritize which zvol devices should be fully cached in DRAM on a server that cannot fit them all in memory. It is not clear to me that this will make sense in a world of snapshots and dedup. Could you explain your requirements in more detail? I am using zvol's to hold the metadata for another filesystem (SAM-QFS). In some circumstances I can fit enough of this into the ARC that virtually all metadata reads IOPS happen at DRAM performance rather than SSD or slower. However, with a single server hosting multiple filesystem (hence multiple zvols) I would like to be able to prioritize the use of the ARC. P.S. Since SAM-QFS metadata is highly compressible, O(10x), it would also be great if there was an option to cache the compressed blocks in DRAM (and not just the decompressed version). Thanks. -- Stuart Anderson ander...@ligo.caltech.edu http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Query zfs send objects
Is there a simple way to query zfs send binary objects for basic information such as: 1) What snapshot they represent? 2) When they where created? 3) Whether they are the result of an incremental send? 4) What the the baseline snapshot was, if applicable? 5) What ZFS version number they where made from? 6) Anything else that would be useful to keep them around as backup binary blobs on an archival system, e.g., SAM-QFS? Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On Apr 2, 2010, at 5:08 AM, Edward Ned Harvey wrote: I know it is way after the fact, but I find it best to coerce each drive down to the whole GB boundary using format (create Solaris partition just up to the boundary). Then if you ever get a drive a little smaller it still should fit. It seems like it should be unnecessary. It seems like extra work. But based on my present experience, I reached the same conclusion. If my new replacement SSD with identical part number and firmware is 0.001 Gb smaller than the original and hence unable to mirror, what's to prevent the same thing from happening to one of my 1TB spindle disk mirrors? Nothing. That's what. I take it back. Me. I am to prevent it from happening. And the technique to do so is precisely as you've said. First slice every drive to be a little smaller than actual. Then later if I get a replacement device for the mirror, that's slightly smaller than the others, I have no reason to care. However, I believe there are some downsides to letting ZFS manage just a slice rather than an entire drive, but perhaps those do not apply as significantly to SSD devices? Thanks -- Stuart Anderson ander...@ligo.caltech.edu http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
Edward Ned Harvey solaris2 at nedharvey.com writes: Allow me to clarify a little further, why I care about this so much. I have a solaris file server, with all the company jewels on it. I had a pair of intel X.25 SSD mirrored log devices. One of them failed. The replacement device came with a newer version of firmware on it. Now, instead of appearing as 29.802 Gb, it appears at 29.801 Gb. I cannot zpool attach. New device is too small. So apparently I'm the first guy this happened to. Oracle is caught totally off guard. They're pulling their inventory of X25's from dispatch warehouses, and inventorying all the firmware versions, and trying to figure it all out. Meanwhile, I'm still degraded. Or at least, I think I am. Nobody knows any way for me to remove my unmirrored log device. Nobody knows any way for me to add a mirror to it (until they can locate a drive with the correct firmware.) All the support people I have on the phone are just as scared as I am. Well we could upgrade the firmware of your existing drive, but that'll reduce it by 0.001 Gb, and that might just create a time bomb to destroy your pool at a later date. So we don't do it. Nobody has suggested that I simply shutdown and remove my unmirrored SSD, and power back on. We ran into something similar with these drives in an X4170 that turned out to be an issue of the preconfigured logical volumes on the drives. Once we made sure all of our Sun PCI HBAs where running the exact same version of firmware and recreated the volumes on new drives arriving from Sun we got back into sync on the X25-E devices sizes. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On Mar 31, 2010, at 8:58 PM, Edward Ned Harvey wrote: We ran into something similar with these drives in an X4170 that turned out to be an issue of the preconfigured logical volumes on the drives. Once we made sure all of our Sun PCI HBAs where running the exact same version of firmware and recreated the volumes on new drives arriving from Sun we got back into sync on the X25-E devices sizes. Can you elaborate? Just today, we got the replacement drive that has precisely the right version of firmware and everything. Still, when we plugged in that drive, and create simple volume in the storagetek raid utility, the new drive is 0.001 Gb smaller than the old drive. I'm still hosed. Are you saying I might benefit by sticking the SSD into some laptop, and zero'ing the disk? And then attach to the sun server? Are you saying I might benefit by finding some other way to make the drive available, instead of using the storagetek raid utility? Assuming you are also using a PCI LSI HBA from Sun that is managed with a utility called /opt/StorMan/arcconf and reports itself as the amazingly informative model number Sun STK RAID INT what worked for me was to run, arcconf delete (to delete the pre-configured volume shipped on the drive) arcconf create (to create a new volume) What I observed was that arcconf getconfig 1 would show the same physical device size for our existing drives and new ones from Sun, but they reported a slightly different logical volume size. I am fairly sure that was due to the Sun factory creating the initial volume with a different version of the HBA controller firmware then we where using to create our own volumes. If I remember the sign correctly, the newer firmware creates larger logical volumes, and you really want to upgrade the firmware if you are going to be running multiple X25-E drives from the same controller. I hope that helps. -- Stuart Anderson ander...@ligo.caltech.edu http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS caching of compressed data
On Oct 2, 2009, at 11:54 AM, Robert Milkowski wrote: Stuart Anderson wrote: On Oct 2, 2009, at 5:05 AM, Robert Milkowski wrote: Stuart Anderson wrote: I am wondering if the following idea makes any sense as a way to get ZFS to cache compressed data in DRAM? In particular, given a 2-way zvol mirror of highly compressible data on persistent storage devices, what would go wrong if I dynamically added a ramdisk as a 3rd mirror device at boot time? Would ZFS route most (or all) of the reads to the lower latency DRAM device? In the case of an un-clean shutdown where there was no opportunity to actively remove the ramdisk from the pool before shutdown would there be any problem at boot time when the ramdisk is still registered but unavailable? Note, this Gedanken experiment is for highly compressible (~9x) metadata for a non-ZFS filesystem. You would only get about 33% of IO's served from ram-disk. With SVM you are allowed to specify a read policy on sub-mirrors for just this reason, e.g., http://wikis.sun.com/display/BigAdmin/Using+a+SVM+submirror+on+a+ramdisk+to+increase+read+performance Is there no equivalent in ZFS? Nope, at least not right now. Curious if anyone knows of any other ideas/plans for ZFS caching compressed data internally? or externally via a ramdisk mirror device that handles most/all read requests? Thanks. -- Stuart Anderson ander...@ligo.caltech.edu http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] force 4k writes
On Dec 17, 2009, at 9:21 PM, Richard Elling wrote: On Dec 17, 2009, at 9:04 PM, stuart anderson wrote: As a specific example of 2 devices with dramatically different performance for sub-4k transfers has anyone done any ZFS benchmarks between the X25E and the F20 they can share? I am particularly interested in zvol performance with a blocksize of 16k and highly compressible data (~10x). 16 KB recordsize? That seems a little unusual, what is the application? SAM-QFS metadata whose fundamental disk allocation unit (DAU) size for metadata is 16kB. I am going to run some comparison tests but would appreciate any initial input on what to look out for or how to tune ZFS to get the most out of the F20. AFAICT, no tuning should be required. It is quite fast. It might be helpful, e.g., if there where some where in the software stack where I could tell part of the system to lie and treat the F20 as a 4k device? The F20 is rated at 84,000 random 4KB write IOPS. The DRAM write buffer will hide 4KB write effects. Not from some direct vdbench comparison results I have seen. My main concern here has to do with ZFS compression, which I need for my application, breaking up the transfer sizes the F20 sees into smaller than 4KB writes where there is a critical performance difference. I also suspect/hope that SAM-QFS is telling ZFS to aggressively flush/commit any metadata updates to stable storage which probably aggravates the problem though I have not test this yet. OTOH, the X-25E is rated at 3,300 random 4KB writes. It shouldn't take much armchair analysis to come to the conclusion that the F20 is likely to win that IOPS battle :-) Though to be fair you should probably compare a single F20 DOM to an X25-E, or 4 X25E's to a full F20, and of course my systems don't run from an armchair :) Thanks. -- Stuart Anderson ander...@ligo.caltech.edu http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] force 4k writes
On Wed, Dec 16 at 7:35, Bill Sprouse wrote: The question behind the question is, given the really bad things that can happen performance-wise with writes that are not 4k aligned when using flash devices, is there any way to insure that any and all writes from ZFS are 4k aligned? Some flash devices can handle this better than others, often several orders of magnitude better. Not all devices (as you imply) are so-affected. As a specific example of 2 devices with dramatically different performance for sub-4k transfers has anyone done any ZFS benchmarks between the X25E and the F20 they can share? I am particularly interested in zvol performance with a blocksize of 16k and highly compressible data (~10x). I am going to run some comparison tests but would appreciate any initial input on what to look out for or how to tune ZFS to get the most out of the F20. It might be helpful, e.g., if there where some where in the software stack where I could tell part of the system to lie and treat the F20 as a 4k device? Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zvol used apparently greater than volsize for sparse volume
Cindy, Thanks for the pointer. Until this is resolved, is there some documentation available that will let me calculate this by hand? I would like to know how large the current 3-4% meta data storage I am observing can potentially grow. Thanks. On Oct 20, 2009, at 8:57 AM, Cindy Swearingen wrote: Hi Stuart, The reason why used is larger than the volsize is because we aren't accounting for metadata, which is covered by this CR: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6429996 6429996 zvols don't reserve enough space for requisite meta data Metadata is usually only a small percentage. Sparse-ness is not a factor here. Sparse just means we ignore the reservation so you can create a zvol bigger than what we'd normally allow. Cindy On 10/17/09 13:47, Stuart Anderson wrote: What does it mean for the reported value of a zvol volsize to be less than the product of used and compressratio? -- Stuart Anderson ander...@ligo.caltech.edu http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zvol used apparently greater than volsize for sparse volume
What does it mean for the reported value of a zvol volsize to be less than the product of used and compressratio? For example, # zfs get -p all home1/home1mm01 NAME PROPERTY VALUE SOURCE home1/home1mm01 type volume - home1/home1mm01 creation 1254440045 - home1/home1mm01 used 14902492672- home1/home1mm01 available16240062464- home1/home1mm01 referenced 14902492672- home1/home1mm01 compressratio11.20x - home1/home1mm01 reservation 0 default home1/home1mm01 volsize 161061273600 - home1/home1mm01 volblocksize 16384 - home1/home1mm01 checksum on default home1/home1mm01 compression gzip-1 inherited from home1 home1/home1mm01 readonly offdefault home1/home1mm01 shareiscsi offdefault home1/home1mm01 copies 1 default home1/home1mm01 refreservation 0 default Yet used (14902492672) * compresratio (11.20) = 166907917926 which is 3.6% larger than volsize. Is this a bug or a feature for sparse volumes? If a feature, how much larger than volsize/compressratio can the actual used storage space grow? e.g., fixed amount overhead and/or fixed percentage? Thanks. -- Stuart Anderson ander...@ligo.caltech.edu http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS caching of compressed data
On Oct 2, 2009, at 5:05 AM, Robert Milkowski wrote: Stuart Anderson wrote: I am wondering if the following idea makes any sense as a way to get ZFS to cache compressed data in DRAM? In particular, given a 2-way zvol mirror of highly compressible data on persistent storage devices, what would go wrong if I dynamically added a ramdisk as a 3rd mirror device at boot time? Would ZFS route most (or all) of the reads to the lower latency DRAM device? In the case of an un-clean shutdown where there was no opportunity to actively remove the ramdisk from the pool before shutdown would there be any problem at boot time when the ramdisk is still registered but unavailable? Note, this Gedanken experiment is for highly compressible (~9x) metadata for a non-ZFS filesystem. You would only get about 33% of IO's served from ram-disk. With SVM you are allowed to specify a read policy on sub-mirrors for just this reason, e.g., http://wikis.sun.com/display/BigAdmin/Using+a+SVM+submirror+on+a+ramdisk+to+increase+read+performance Is there no equivalent in ZFS? However at the KCA conference Bill and Jeff mentioned Just-in-time decompression/decryption planned for ZFS. If I understand it correctly some % of pages in ARC will be kept compressed/encrypted and will be decompressed/decrypted only if accessed. This could be especially useful to do so with prefetch. I thought the optimization being discussed there was simply to avoid decompressing/decrypting unused data. I missed the part about keeping compressed data around in the ARC . Now I would imaging that one will be able to tune what's percentage of ARC should keep compressed pages. That would be nice. Now I don't remember if they mentioned L2ARC here but it would probably be useful to have a tunable which would put compressed or uncompressed data onto L2ARC depending on it's value. Which approach is better would always depends on a given environment and on where an actual bottleneck is. I agree something like this would be preferable to the SVM ramdisk solution. Thanks. -- Stuart Anderson ander...@ligo.caltech.edu http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS caching of compressed data
I am wondering if the following idea makes any sense as a way to get ZFS to cache compressed data in DRAM? In particular, given a 2-way zvol mirror of highly compressible data on persistent storage devices, what would go wrong if I dynamically added a ramdisk as a 3rd mirror device at boot time? Would ZFS route most (or all) of the reads to the lower latency DRAM device? In the case of an un-clean shutdown where there was no opportunity to actively remove the ramdisk from the pool before shutdown would there be any problem at boot time when the ramdisk is still registered but unavailable? Note, this Gedanken experiment is for highly compressible (~9x) metadata for a non-ZFS filesystem. Thanks. -- Stuart Anderson ander...@ligo.caltech.edu http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Transient permanent errors
I have seen this again on a different server. Presumably not a big deal, but a false alarm about data corruption is probably not good marketing for ZFS. Is this fixed in an opensolaris build? # pca -l a -p ZFS Using /var/tmp/patchdiag.xref from Sep/11/09 Host: samhome1 (SunOS 5.10/Generic_141415-10/i386/i86pc) List: a (2/88) Patch IR CR RSB Age Synopsis -- -- - -- --- --- --- 141105 02 = 02 --- 58 SunOS 5.10_x86: ZFS Administration Java Web Console Patch 141909 03 = 03 R-- 30 SunOS 5.10_x86: ZFS patch # zpool status -v rpool pool: rpool state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub in progress for 0h7m, 93.90% done, 0h0m to go config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t0d0s0 ONLINE 0 0 0 c0t1d0s0 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: //dev/dsk/c0t0d0s0 //dev/dsk/c0t1d0s0 # zpool status -v rpool pool: rpool state: ONLINE scrub: scrub completed after 0h8m with 0 errors on Sun Sep 13 17:22:47 2009 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t0d0s0 ONLINE 0 0 0 c0t1d0s0 ONLINE 0 0 0 errors: No known data errors Thanks. On Jun 28, 2009, at 7:31 PM, Stuart Anderson wrote: This is S10U7 fully patched and not open solaris, but I would appreciate any advice on the following transient Permanent error message generated while running a zpool scrub. -- Stuart Anderson ander...@ligo.caltech.edu http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Change the volblocksize of a ZFS volume
Question : Is there a way to change the volume blocksize say via 'zfs snapshot send/receive'? As I see things, this isn't possible as the target volume (including property values) gets overwritten by 'zfs receive'. By default, properties are not received. To pass properties, you need to use the -R flag. I have tried that, and while it works for properties like compression, I have not found a way to preserve a non-default volblocksize across zfs send | zfs receive. the zvol created on the receive side is always defaulting to 8k. Is there a way to do this? I spoke too soon. More particularly, during the zfs send/recv processes the receiving side reports 8k, but once the receive is done the volblocksize is reporting the expected value as sent with -R. Hopefully, this is just a reporting bug during an active receive. Note, this was observed with s10u7 (x86). Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Change the volblocksize of a ZFS volume
Question : Is there a way to change the volume blocksize say via 'zfs snapshot send/receive'? As I see things, this isn't possible as the target volume (including property values) gets overwritten by 'zfs receive'. By default, properties are not received. To pass properties, you need to use the -R flag. I have tried that, and while it works for properties like compression, I have not found a way to preserve a non-default volblocksize across zfs send | zfs receive. the zvol created on the receive side is always defaulting to 8k. Is there a way to do this? Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Transient permanent errors
This is S10U7 fully patched and not open solaris, but I would appreciate any advice on the following transient Permanent error message generated while running a zpool scrub. # zpool status -v rpool pool: rpool state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub in progress for 0h8m, 57.22% done, 0h6m to go config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t1d0s0 ONLINE 0 0 0 c2t0d0s0 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: //dev/dsk/c2t0d0s0 //dev/dsk/c2t1d0s0 Disconcerting that the actual pool devices are flagged as corrupt. However, these are just symbolic links and when the scrub completed the Permanent errors had disappeared: # zpool status -v rpool pool: rpool state: ONLINE scrub: scrub completed after 0h19m with 0 errors on Sun Jun 28 19:21:19 2009 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t1d0s0 ONLINE 0 0 0 c2t0d0s0 ONLINE 0 0 0 errors: No known data errors Thanks. -- Stuart Anderson ander...@ligo.caltech.edu http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Speeding up resilver on x4500
On Jun 23, 2009, at 11:50 AM, Richard Elling wrote: (2) is there some reasonable way to read in multiples of these blocks in a single IOP? Theoretically, if the blocks are in chronological creation order, they should be (relatively) sequential on the drive(s). Thus, ZFS should be able to read in several of them without forcing a random seek. That is, you should be able to get multiple blocks in a single IOP. Metadata is prefetched. You can look at the hit rate in kstats. Stuart, you might post the output of kstat -n vdev_cache_stats I regularly see cache hit rates in the 60% range, which isn't bad considering what is being cached. # kstat -n vdev_cache_stats module: zfs instance: 0 name: vdev_cache_statsclass:misc crtime 129.03798177 delegations 25873382 hits114064783 misses 182253696 snaptime960064.85352608 Here is also some zpool iostat numbers during this resilver, # zpool iostat ldas-cit1 10 capacity operationsbandwidth pool used avail read write read write -- - - - - - - ldas-cit1 16.9T 3.49T165134 5.17M 1.58M ldas-cit1 16.9T 3.49T225237 1.28M 1.98M ldas-cit1 16.9T 3.49T288317 1.53M 2.26M ldas-cit1 16.9T 3.49T174269 1014K 1.68M And here is the pool configuration, # zpool status ldas-cit1 pool: ldas-cit1 state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 96h49m, 63.69% done, 55h12m to go config: NAMESTATE READ WRITE CKSUM ldas-cit1 DEGRADED 0 0 0 raidz2DEGRADED 0 0 0 c0t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 spare DEGRADED 0 0 0 replacing DEGRADED 0 0 0 c6t2d0s0/o FAULTED 0 0 0 corrupted data c6t2d0 ONLINE 0 0 0 c6t0d0ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 raidz2ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c5t3d0 ONLINE 0 0 0 c6t3d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c5t5d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 raidz2ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c3t6d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 c5t6d0 ONLINE 0 0 0 c6t6d0 ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 c3t7d0 ONLINE 0 0 0 c4t7d0 ONLINE 0 0 0 c5t7d0 ONLINE 0 0 0 c6t7d0 ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 spares c6t0d0INUSE currently in use errors: No known data errors -- Stuart Anderson ander
Re: [zfs-discuss] Speeding up resilver on x4500
On Jun 21, 2009, at 10:21 PM, Nicholas Lee wrote: On Mon, Jun 22, 2009 at 4:24 PM, Stuart Anderson ander...@ligo.caltech.edu wrote: However, it is a bit disconcerting to have to run with reduced data protection for an entire week. While I am certainly not going back to UFS, it seems like it should be at least theoretically possible to do this several orders of magnitude faster, e.g., what if every block on the replacement disk had its RAIDZ2 data recomputed from the degraded Maybe this is also saying - that for large disk sets a single RAIDZ2 provides a false sense of security. This configuration is with 3 large RAIDZ2 devices but I have more recently been building thumper/thor systems with a larger number of smaller RAIDZ2's. Thanks. -- Stuart Anderson ander...@ligo.caltech.edu http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Speeding up resilver on x4500
It is currently taking ~1 week to resilver an x4500 running S10U6, recently patched with~170M small files on ~170 datasets after a disk failure/replacement, i.e., scrub: resilver in progress for 53h47m, 30.72% done, 121h19m to go Is there anything that can be tuned to improve this performance, e.g., adding a faster cache device for reading and/or writing? I am also curious if anyone has a prediction on when the snapshot-restarting-resilvering bug will be patched in Solaris 10? http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6343667 Thanks. -- Stuart Anderson ander...@ligo.caltech.edu http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Speeding up resilver on x4500
On Jun 21, 2009, at 8:57 PM, Richard Elling wrote: Stuart Anderson wrote: It is currently taking ~1 week to resilver an x4500 running S10U6, recently patched with~170M small files on ~170 datasets after a disk failure/replacement, i.e., wow, that is impressive. There is zero chance of doing that with a manageable number of UFS file systems. However, it is a bit disconcerting to have to run with reduced data protection for an entire week. While I am certainly not going back to UFS, it seems like it should be at least theoretically possible to do this several orders of magnitude faster, e.g., what if every block on the replacement disk had its RAIDZ2 data recomputed from the degraded array regardless of whether the pool was using it or not. In that case I would expect it to be able to sequentially reconstruct in the same few hours it would take a HW RAID controller to do the same RAID6 job. Perhaps there needs to be an option to re-order the loops for resilvering on pools with lots of small files to resilver in device order rather than filesystem order? scrub: resilver in progress for 53h47m, 30.72% done, 121h19m to go Is there anything that can be tuned to improve this performance, e.g., adding a faster cache device for reading and/or writing? Resilver tends to be bound by one of two limits: 1. sequential write performance of the resilvering device 2. random I/O performance of the non-resilvering devices A quick look at iostat leads me to conjecture that the vdev rebuilding is taking a very low priority compared to ongoing application I/O (NFSD in this case). Are there any ZFS knobs that control the relative priority of resilvering to other disk I/O tasks? Thanks. -- Stuart Anderson ander...@ligo.caltech.edu http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Confused by compressratio
On Tue, Apr 15, 2008 at 03:51:17PM -0700, Richard Elling wrote: UTSL. compressratio is the ratio of uncompressed bytes to compressed bytes. http://cvs.opensolaris.org/source/search?q=ZFS_PROP_COMPRESSRATIOdefs=refs=path=zfshist=project=%2Fonnv IMHO, you will (almost) never get the same number looking at bytes as you get from counting blocks. If I can't use /bin/ls to get an accurate measure of the number of compressed blocks used (-s) and the original number of uncompressed bytes (-l). What is a more accurate way to measure these? As a gedankan experiment, what command(s) can I run to examine a compressed ZFS filesystem and determine how much space it will require to replicate to an uncompressed ZFS filesystem? I can add up the file sizes, e.g., /bin/ls -lR | grep ^- | nawk '{SUM+=$5}END{print SUM}' but I would have thought there was a more efficient way using the already aggregated filesystem metadata via /bin/df or zfs list and the compressratio. Thanks. -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Confused by compressratio
On Wed, Apr 16, 2008 at 10:09:00AM -0700, Richard Elling wrote: Stuart Anderson wrote: On Tue, Apr 15, 2008 at 03:51:17PM -0700, Richard Elling wrote: UTSL. compressratio is the ratio of uncompressed bytes to compressed bytes. http://cvs.opensolaris.org/source/search?q=ZFS_PROP_COMPRESSRATIOdefs=refs=path=zfshist=project=%2Fonnv IMHO, you will (almost) never get the same number looking at bytes as you get from counting blocks. If I can't use /bin/ls to get an accurate measure of the number of compressed blocks used (-s) and the original number of uncompressed bytes (-l). What is a more accurate way to measure these? ls -s should give the proper number of blocks used. ls -l should give the proper file length. Do not assume that compressed data in a block consumes the whole block. Not even on a pristine ZFS filesystem where just one file has been created? As a gedankan experiment, what command(s) can I run to examine a compressed ZFS filesystem and determine how much space it will require to replicate to an uncompressed ZFS filesystem? I can add up the file sizes, e.g., /bin/ls -lR | grep ^- | nawk '{SUM+=$5}END{print SUM}' but I would have thought there was a more efficient way using the already aggregated filesystem metadata via /bin/df or zfs list and the compressratio. IMHO, this is a by-product of the dynamic nature of ZFS. Are you saying it can't be done except by adding up all the individual file sizes? Personally, I'd estimate using du rather than ls. They report the exact same number as far as I can tell. With the caveat that Solaris ls -s returns the number of 512-byte blocks, whereas GNU ls -s returns the number of 1024byte blocks by default. Thanks. -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Confused by compressratio
On Wed, Apr 16, 2008 at 02:07:53PM -0700, Richard Elling wrote: Personally, I'd estimate using du rather than ls. They report the exact same number as far as I can tell. With the caveat that Solaris ls -s returns the number of 512-byte blocks, whereas GNU ls -s returns the number of 1024byte blocks by default. That is file-system dependent. Some file systems have larger blocks and ls -s shows the size in blocks. ZFS uses dynamic block sizes, but you knew that already... :-) -- richard OK, we are now clearly exposing my ignorance, so hopefully I can learn something new about ZFS. What is the distinction/relationship between recordsize (which as I understand is a fixed quantity for each ZFS dataset) and dynamic block sizes? Are blocks what are allocated for metadata, and records what are allocated for data, i.e., the contents of files? What does it mean that blocks are compressed for a ZFS dataset with compression=off? Is this equivalent to saying that ZFS metadata is always compressed? Is there any ZFS documentation that shows by example exactly how to interpret the the various numbers from ls, du, df, and zfs used/refernced/ available/compressratio in the context of compression={on,off}, possibly also refering to both sparse and non-sparse files? Thanks. -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Confused by compressratio
On Tue, Apr 15, 2008 at 01:37:43PM -0400, Luke Scharf wrote: zfs list /export/compress NAME USED AVAIL REFER MOUNTPOINT export-cit/compress 90.4M 1.17T 90.4M /export/compress is 2GB/90.4M = 2048 / 90.4 = 22.65 That still leaves me puzzled what the precise definition of compressratio is? My guess is that the compressratio doesn't include any of those runs of null characaters that weren't actually written to the disk. This test was done with a file created with via /bin/yes | head, i.e., it does not have any null characters specifically for this possibility. -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Confused by compressratio
On Mon, Apr 14, 2008 at 09:59:48AM -0400, Luke Scharf wrote: Stuart Anderson wrote: As an artificial test, I created a filesystem with compression enabled and ran mkfile 1g and the reported compressratio for that filesystem is 1.00x even though this 1GB file only uses only 1kB. ZFS seems to treat files filled with zeroes as sparse files, regardless of whether or not compression is enabled. Try dd if=/dev/urandom of=1g.dat bs=1024 count=1048576 to create a file that won't exhibit this behavior. Creating this file is a lot slower than writing zeroes (mostly due to the speed of the urandom device), but ZFS won't treat it like a sparse file, and it won't compress very well either. However, I am still trying to reconcile the compression ratio as reported by compressratio vs the ratio of file sizes to disk blocks used (whether or not ZFS is creating sparse files). Regarding sparse files, I recently found that the builtin heuristic for auto detecting and creating sparse files in the GNU cp program works on ZFS filesystems. In particular, if you use GNU cp to copy a file from ZFS and it has a string of null characters in it (whether or not it is stored as a sparse file) the output file (regardless of the destination filesystem type) will be a sparse file. I have not seen this behavior for copying such files from other source filesystems. Thanks. -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Confused by compressratio
On Mon, Apr 14, 2008 at 05:22:03PM -0400, Luke Scharf wrote: Stuart Anderson wrote: On Mon, Apr 14, 2008 at 09:59:48AM -0400, Luke Scharf wrote: Stuart Anderson wrote: As an artificial test, I created a filesystem with compression enabled and ran mkfile 1g and the reported compressratio for that filesystem is 1.00x even though this 1GB file only uses only 1kB. ZFS seems to treat files filled with zeroes as sparse files, regardless of whether or not compression is enabled. Try dd if=/dev/urandom of=1g.dat bs=1024 count=1048576 to create a file that won't exhibit this behavior. Creating this file is a lot slower than writing zeroes (mostly due to the speed of the urandom device), but ZFS won't treat it like a sparse file, and it won't compress very well either. However, I am still trying to reconcile the compression ratio as reported by compressratio vs the ratio of file sizes to disk blocks used (whether or not ZFS is creating sparse files). Can you describe the data you're storing a bit? Any big disk images? Understanding the mkfile case would be a start, but the initial filesystem that started my confusion is one that has a number of ~50GByte mysql database files as well as a number of application code files. Here is another simple test to avoid any confusion/bugs related to NULL character sequeneces being compressed to nothing versus being treated as sparse files. In particular, a 2GByte file full of the output of /bin/yes: zfs create export-cit/compress cd /export/compress /bin/df -k . Filesystemkbytesused avail capacity Mounted on export-cit/compress 1704858624 55 1261199742 1%/export/compress zfs get compression export-cit/compress NAME PROPERTY VALUESOURCE export-cit/compress compression on inherited from export-cit /bin/yes | head -1073741824 yes.dat /bin/ls -ls yes.dat 185017 -rw-r--r-- 1 root root 2147483648 Apr 14 15:31 yes.dat /bin/df -k . Filesystemkbytesused avail capacity Mounted on export-cit/compress 1704858624 92563 1261107232 1%/export/compress zfs get compressratio export-cit/compress NAME PROPERTY VALUESOURCE export-cit/compress compressratio 28.39x - So compressratio reports 28.39, but the ratio of file size to used disk for the only regular file on this filesystem, i.e., excluding the initial 55kB allocated for the empty filesystem is: 2147483648 / (185017 * 512) = 22.67 Calculated another way from zfs list for the entire filesystem: zfs list /export/compress NAME USED AVAIL REFER MOUNTPOINT export-cit/compress 90.4M 1.17T 90.4M /export/compress is 2GB/90.4M = 2048 / 90.4 = 22.65 That still leaves me puzzled what the precise definition of compressratio is? Thanks. --- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Confused by compressratio
I am confused by the numerical value of compressratio. I copied a compressed ZFS filesystem that is 38.5G in size (zfs list USED and REFER value) and reports a compressratio value of 2.52x to an uncompressed ZFS filesystem and it expanded to 198G. So why is the compressratio 2.52 rather than 198/38.5 = 5.14? As an artificial test, I created a filesystem with compression enabled and ran mkfile 1g and the reported compressratio for that filesystem is 1.00x even though this 1GB file only uses only 1kB. Note, this was done with ZFS version 4 on S10U4. I would appreciate any help in understanding what compressratio means. Thanks. -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] scrub performance
I currently have an X4500 running S10U4 with the latest ZFS uber patch (127729-07) for which zpool scrub is making very slow progress even though the necessary resources are apparently available. Currently it has been running for 3 days to reach 75% completion, however, in the last 12 hours this only advanced by 3%. At times this server is busy running NFSD and it is understandable that the scrub to take a lower priority, however, I have observed interestingly long time intervals when neither prstat nor iostat show any obvious bottlenecks, e.g., disks at 10% busy. Is there a throttle on scrub resource allocation that does not readily open up again after being limited due to other system activity? For comparison, an identical system (same OS/zpool config, and roughly the same number of filesystems and files) finished a scrub in 2 days. This is not a critical problem, but at least initially it was clear from iostat that scrub was pegging all the disk IOPS/BW as available, but I am curious why it has backed off from that after a few days of running. P.S. I realize it is not a user command and that the last event can be found in zpool status, but I would find it convenient if the scrub completion event was also logged in the zpool history along with the initiation event. Thanks. -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] scrub performance
On Thu, Mar 06, 2008 at 11:51:00AM -0800, Stuart Anderson wrote: I currently have an X4500 running S10U4 with the latest ZFS uber patch (127729-07) for which zpool scrub is making very slow progress even though the necessary resources are apparently available. Currently it has It is also interesting to note that this system is now making negative progress. I can understand the remaining time estimate going up with time, but what does it mean for the % complete number to go down after 6 hours of work? Thanks. # zpool status | egrep -e progress|errors ; date scrub: scrub in progress, 75.49% done, 28h51m to go errors: No known data errors Thu Mar 6 08:50:59 PST 2008 # zpool status | egrep -e progress|errors ; date scrub: scrub in progress, 75.24% done, 31h20m to go errors: No known data errors Thu Mar 6 15:15:39 PST 2008 -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] scrub performance
On Thu, Mar 06, 2008 at 05:55:53PM -0800, Marion Hakanson wrote: [EMAIL PROTECTED] said: It is also interesting to note that this system is now making negative progress. I can understand the remaining time estimate going up with time, but what does it mean for the % complete number to go down after 6 hours of work? Sorry I don't have any helpful experience in this area. It occurs to me that perhaps you are detecting a gravity wave of some sort -- Thumpers are pretty heavy, and thus may be more affected than the average server. Or the guys at SLAC have, unbeknownst to you, somehow accelerated your Thumper to near the speed of light. (:-) If true, that would certainly help, since we actually are using these thumpers to help detect gravitational waves! See, http://www.ligo.caltech.edu. Thanks. -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on arc_buf_remove_ref() assertion
In this particular case will 127729-07 contain all the bug fixes in IDR127787-12 (or later?). I have also run into a few other kernel panics addressed in earlier revisions of this IDR but I am eager to get back on the main Sol10 branch. Thanks. On Mon, Feb 18, 2008 at 08:45:46PM -0800, Prabahar Jeyaram wrote: Any IDRXX (Released immediately) is the interim relief (Will also contains the fix) provided to the customers till the official patch (Usually takes longer to be released) is available. Patch is supposed to be consider as the permanent solution. -- Prabahar. Stuart Anderson wrote: Thanks for the information. How does the temporary patch 127729-07 relate to the IDR127787 (x86) which I believe also claims to fix this panic? -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Kernel panic on arc_buf_remove_ref() assertion
Is this kernel panic a known ZFS bug, or should I open a new ticket? Note, this happened on an X4500 running S10U4 (127112-06) with NCQ disabled. Thanks. Feb 18 17:55:18 thumper1 ^Mpanic[cpu1]/thread=fe8000809c80: Feb 18 17:55:18 thumper1 genunix: [ID 403854 kern.notice] assertion failed: arc_buf_remove_ref(db-db_buf, db) == 0, file: ../../common/fs/zfs/dbuf.c, line: 1692 Feb 18 17:55:18 thumper1 unix: [ID 10 kern.notice] Feb 18 17:55:18 thumper1 genunix: [ID 802836 kern.notice] fe80008099d0 fb9c9853 () Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809a00 zfs:zfsctl_ops_root+2fac59f2 () Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809a30 zfs:dbuf_write_done+c8 () Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809a70 zfs:arc_write_done+13b () Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809ac0 zfs:zio_done+1b8 () Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809ad0 zfs:zio_next_stage+65 () Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809b00 zfs:zio_wait_for_children+49 () Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809b10 zfs:zio_wait_children_done+15 () Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809b20 zfs:zio_next_stage+65 () Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809b60 zfs:zio_vdev_io_assess+84 () Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809b70 zfs:zio_next_stage+65 () Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809bd0 zfs:vdev_mirror_io_done+c1 () Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809be0 zfs:zio_vdev_io_done+14 () Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809c60 genunix:taskq_thread+bc () Feb 18 17:55:18 thumper1 genunix: [ID 655072 kern.notice] fe8000809c70 unix:thread_start+8 () Feb 18 17:55:18 thumper1 unix: [ID 10 kern.notice] -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on arc_buf_remove_ref() assertion
On Mon, Feb 18, 2008 at 06:28:31PM -0800, Stuart Anderson wrote: Is this kernel panic a known ZFS bug, or should I open a new ticket? Feb 18 17:55:18 thumper1 genunix: [ID 403854 kern.notice] assertion failed: arc_buf_remove_ref(db-db_buf, db) == 0, file: ../../common/fs/zfs/dbuf.c, line: 1692 It looks like this might be bug 6523336, http://sunsolve.sun.com/search/document.do?assetkey=1-66-201229-1 Does anyone know when the Binary relief for this and other Sol10 ZFS kernel panics will be released as normal kernel patches? Thanks. -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Kernel panic on arc_buf_remove_ref() assertion
Thanks for the information. How does the temporary patch 127729-07 relate to the IDR127787 (x86) which I believe also claims to fix this panic? Thanks. On Mon, Feb 18, 2008 at 08:32:03PM -0800, Prabahar Jeyaram wrote: The patches (127728-06 : sparc, 127729-07 : x86) which has the fix for this panic is in temporary state and will be released via SunSolve soon. Please contact your support channel to get these patches. -- Prabahar. Stuart Anderson wrote: On Mon, Feb 18, 2008 at 06:28:31PM -0800, Stuart Anderson wrote: Is this kernel panic a known ZFS bug, or should I open a new ticket? Feb 18 17:55:18 thumper1 genunix: [ID 403854 kern.notice] assertion failed: arc_buf_remove_ref(db-db_buf, db) == 0, file: ../../common/fs/zfs/dbuf.c, line: 1692 It looks like this might be bug 6523336, http://sunsolve.sun.com/search/document.do?assetkey=1-66-201229-1 Does anyone know when the Binary relief for this and other Sol10 ZFS kernel panics will be released as normal kernel patches? Thanks. -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Parallel zfs destroy results in No more processes
Do you have sata Native Command Queuing enabled? I've experienced delays of just under one minute when NCQ is enabled, that do not occur when NCQ is disabled. If all threads comprising the parallel zfs destroy hang for a minute, I bet its the hang that causes no more processes. I have opened a trouble ticket on this issue, and am waiting for feedback. In the mean time, I've disabled NCQ by adding the line below to /etc/system (and rebooting). set sata:sata_func_enable = 0x5 Not on this system. It is not clear to me how these timeout/disconnected problems would cause a call to fork() to fail but I can give that a try the next time I need to delete that much data. However, we have disabled NCQ through this mechanism on another system that was locking up ~1/week with several device disconnected messgaes. That system has been up for 2 weeks after disabling NCQ and has not displayed any disconnected messages since then. Can anyone confirm that that 125205-07 has solved these NCQ problems? Thanks. -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] X4500 device disconnect problem persists
After applying 125205-07 on two X4500 machines running Sol10U4 and removing set sata:sata_func_enable = 0x5 from /etc/system to re-enable NCQ, I am again observing drive disconnect error messages. This in spite of the patch description which claims multiple fixes in this area: 6587133 repeated DMA command timeouts and device resets on x4500 6538627 x4500 message logs contain multiple device disk resets but nothing logged in FMA 6564956 Disparity error for marvell88sx3 was shown during boot-time for example, Has anyone else had any better luck with this? Thanks. Oct 26 16:25:34 thumper2 marvell88sx: [ID 670675 kern.info] NOTICE: marvell88sx3: device on port 1 reset: no matching NCQ I/O found Oct 26 16:25:34 thumper2 marvell88sx: [ID 670675 kern.info] NOTICE: marvell88sx3: device on port 1 reset: device disconnected or device error Oct 26 16:25:34 thumper2 sata: [ID 801593 kern.notice] NOTICE: /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]: Oct 26 16:25:34 thumper2 port 1: device reset Oct 26 16:25:34 thumper2 sata: [ID 801593 kern.notice] NOTICE: /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]: Oct 26 16:25:34 thumper2 port 1: link lost Oct 26 16:25:34 thumper2 sata: [ID 801593 kern.notice] NOTICE: /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]: Oct 26 16:25:34 thumper2 port 1: link established Oct 26 16:25:34 thumper2 marvell88sx: [ID 812950 kern.warning] WARNING: marvell88sx3: error on port 1: Oct 26 16:25:34 thumper2 marvell88sx: [ID 517869 kern.info] device disconnected Oct 26 16:25:34 thumper2 marvell88sx: [ID 517869 kern.info] device connected Oct 26 16:25:34 thumper2 scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd25): Oct 26 16:25:34 thumper2Error for Command: read(10) Error Level: Retryable Oct 26 16:25:34 thumper2 scsi: [ID 107833 kern.notice] Requested Block: 521002402 Error Block: 521002402 Oct 26 16:25:34 thumper2 scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: Oct 26 16:25:34 thumper2 scsi: [ID 107833 kern.notice] Sense Key: No Additional Sense Oct 26 16:25:34 thumper2 scsi: [ID 107833 kern.notice] ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0 -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Parallel zfs destroy results in No more processes
On Wed, Oct 24, 2007 at 10:40:41AM -0700, David Bustos wrote: Quoth Stuart Anderson on Sun, Oct 21, 2007 at 07:09:10PM -0700: Running 102 parallel zfs destroy -r commands on an X4500 running S10U4 has resulted in No more processes errors in existing login shells for several minutes of time, but then fork() calls started working again. However, none of the zfs destroy processes have actually completed yet, which is odd since some of the filesystems are trivially small. ... Is this a known issue? Any ideas on what resource lots of zfs commands use up to prevent fork() from working? ZFS is known to use a lot of memory. I suspect this problem has diminished in recent Nevada builds. Can you try this on Nevada? I suspect it is more subtle than this since top was reporting that none of the available swap space was being used yet, so there was 16GB of free VM. Unfortunately, I am not currently in a position to switch this system over to Nevada. Thanks. -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Parallel zfs destroy results in No more processes
Running 102 parallel zfs destroy -r commands on an X4500 running S10U4 has resulted in No more processes errors in existing login shells for several minutes of time, but then fork() calls started working again. However, none of the zfs destroy processes have actually completed yet, which is odd since some of the filesystems are trivially small. After fork() started working there where hardly any other processes than the 102 zfs destroy running on the system, i.e., # ps -ef | wc -l 154 Here is a snapshot of top that looks resonable, note especially that free swap is 16GB and that the last pid is still in the range of the ~100 zfs commands being run. Is this a known issue? Any ideas on what resource lots of zfs commands use up to prevent fork() from working? Thanks. last pid: 11473; load avg: 0.35, 0.87, 0.68; up 9+00:21:42 18:56:38 148 processes: 146 sleeping, 1 zombie, 1 on cpu CPU states: 94.2% idle, 0.0% user, 5.8% kernel, 0.0% iowait, 0.0% swap Memory: 16G phys mem, 1029M free mem, 16G total swap, 16G free swap PID USERNAME LWP PRI NICE SIZE RES STATETIMECPU COMMAND 11333 root 1 590 3188K 772K cpu/30:01 0.02% top 622 noaccess 28 590 172M 4528K sleep4:28 0.01% java 528 root 1 590 20M 5092K sleep2:44 0.01% Xorg 431 root 11 590 5620K 1248K sleep0:01 0.01% syslogd 565 root 1 590 10M 1384K sleep0:53 0.00% dtgreet 206 root 1 100 -20 2068K 1128K sleep0:21 0.00% xntpd 10864 root 1 590 7416K 1216K sleep0:00 0.00% sshd 7 root 14 590 12M 680K sleep0:05 0.00% svc.startd 158 root 33 590 6864K 1616K sleep0:15 0.00% nscd 312 root 1 590 1112K 660K sleep0:00 0.00% utmpd 340 root 3 590 3932K 1312K sleep0:00 0.00% inetd 582 root 22 590 17M 2028K sleep5:49 0.00% fmd 11432 root 1 590 4556K 1496K sleep0:30 0.00% zfs 11449 root 1 590 4556K 1496K sleep0:27 0.00% zfs 11360 root 1 590 4552K 1492K sleep0:26 0.00% zfs -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] chgrp -R hangs all writes to pool
On Mon, Jul 16, 2007 at 09:36:06PM -0700, Stuart Anderson wrote: Running Solaris 10 Update 3 on an X4500 I have found that it is possible to reproducibly block all writes to a ZFS pool by running chgrp -R on any large filesystem in that pool. As can be seen below in the zpool iostat output below, after about 10-sec of running the chgrp command all writes to the pool stop, and the pool starts exclusively running a slow background task of 1kB reads. At this point the chgrp -R command is not killable via root kill -9, and in fact even the command halt -d does not do anything. For posterity this appears to have been fixed in S10U4, at least I am unable to reproduce the problem that was easy to trigger with S10U3. Thanks. -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool degraded status after resilver completed
Possibly related is the fact that fmd is now in a CPU spin loop constantly checking the time, even tough there are no reported faults, i.e., # fmdump -v TIME UUID SUNW-MSG-ID fmdump: /var/fm/fmd/fltlog is empty # svcs fmd STATE STIMEFMRI online 13:11:43 svc:/system/fmd:default # prstat PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 422 root 17M 13M run 110 20:42:51 19% fmd/22 # truss -p 422 | head -20 /13:time() = 1189279453 /13:time() = 1189279453 /13:time() = 1189279453 /13:time() = 1189279453 /13:time() = 1189279453 /13:time() = 1189279453 /13:time() = 1189279453 /13:time() = 1189279453 /13:time() = 1189279453 /13:time() = 1189279453 /13:lwp_park(0xFDB7DF40, 0) Err#62 ETIME /13:time() = 1189279453 /13:time() = 1189279453 /13:time() = 1189279453 /13:time() = 1189279453 /13:time() = 1189279453 /13:time() = 1189279453 /13:time() = 1189279453 /13:time() = 1189279453 /13:time() = 1189279453 Is this a known bug with fmd and ZFS? Thanks. On Fri, Sep 07, 2007 at 08:55:52PM -0700, Stuart Anderson wrote: I am curious why zpool status reports a pool to be in the DEGRADED state after a drive in a raidz2 vdev has been successfully replaced. In this particular case drive c0t6d0 was failing so I ran, zpool offline home/c0t6d0 zpool replace home c0t6d0 c8t1d0 and after the resilvering finished the pool reports a degraded state. Hopefully this is incorrect. At this point is the vdev in question now has full raidz2 protected even though it is listed as DEGRADED? -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool degraded status after resilver completed
I am curious why zpool status reports a pool to be in the DEGRADED state after a drive in a raidz2 vdev has been successfully replaced. In this particular case drive c0t6d0 was failing so I ran, zpool offline home/c0t6d0 zpool replace home c0t6d0 c8t1d0 and after the resilvering finished the pool reports a degraded state. Hopefully this is incorrect. At this point is the vdev in question now has full raidz2 protected even though it is listed as DEGRADED? P.S. This is on a pool created on S10U3 and upgraded to ZFS version 4 after upgrading the host to S10U4. Thanks. # zpool status pool: home state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scrub: resilver completed with 0 errors on Fri Sep 7 18:39:03 2007 config: NAME STATE READ WRITE CKSUM home DEGRADED 0 0 0 raidz2 ONLINE 0 0 0 c0t0d0ONLINE 0 0 0 c1t0d0ONLINE 0 0 0 c5t0d0ONLINE 0 0 0 c7t0d0ONLINE 0 0 0 c8t0d0ONLINE 0 0 0 c0t1d0ONLINE 0 0 0 c1t1d0ONLINE 0 0 0 c5t1d0ONLINE 0 0 0 c6t1d0ONLINE 0 0 0 c7t1d0ONLINE 0 0 0 c0t2d0ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c1t2d0ONLINE 0 0 0 c5t2d0ONLINE 0 0 0 c6t2d0ONLINE 0 0 0 c7t2d0ONLINE 0 0 0 c8t2d0ONLINE 0 0 0 c0t3d0ONLINE 0 0 0 c1t3d0ONLINE 0 0 0 c5t3d0ONLINE 0 0 0 c6t3d0ONLINE 0 0 0 c7t3d0ONLINE 0 0 0 c8t3d0ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c0t4d0ONLINE 0 0 0 c1t4d0ONLINE 0 0 0 c5t4d0ONLINE 0 0 0 c7t4d0ONLINE 0 0 0 c8t4d0ONLINE 0 0 0 c0t5d0ONLINE 0 0 0 c1t5d0ONLINE 0 0 0 c5t5d0ONLINE 0 0 0 c6t5d0ONLINE 0 0 0 c7t5d0ONLINE 0 0 0 c8t5d0ONLINE 0 0 0 raidz2 DEGRADED 0 0 0 spare DEGRADED 0 0 0 c0t6d0 OFFLINE 0 0 0 c8t1d0 ONLINE 0 0 0 c1t6d0ONLINE 0 0 0 c5t6d0ONLINE 0 0 0 c6t6d0ONLINE 0 0 0 c7t6d0ONLINE 0 0 0 c8t6d0ONLINE 0 0 0 c0t7d0ONLINE 0 0 0 c1t7d0ONLINE 0 0 0 c5t7d0ONLINE 0 0 0 c6t7d0ONLINE 0 0 0 c7t7d0ONLINE 0 0 0 c8t7d0ONLINE 0 0 0 spares c8t1d0 INUSE currently in use errors: No known data errors -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Kernel panic receiving incremental snapshots
Before I open a new case with Sun, I am wondering if anyone has seen this kernel panic before? It happened on an X4500 running Sol10U3 while it was receiving incremental snapshot updates. Thanks. Aug 25 17:01:50 ldasdata6 ^Mpanic[cpu0]/thread=fe857d53f7a0: Aug 25 17:01:50 ldasdata6 genunix: [ID 895785 kern.notice] dangling dbufs (dn=fe82a3532d10, dbuf=fe8b4e338b90) Aug 25 17:01:50 ldasdata6 unix: [ID 10 kern.notice] Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112a80 zfs:zfsctl_ops_root+2fa59a42 () Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112ac0 zfs:dmu_objset_evict_dbufs+e2 () Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112af0 zfs:dmu_objset_evict+30 () Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112b10 zfs:zfsctl_ops_root+2fa5c0e1 () Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112b30 zfs:dbuf_evict_user+44 () Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112b60 zfs:zfsctl_ops_root+2fa4de31 () Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112b90 zfs:dsl_dataset_close+56 () Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112bb0 zfs:dmu_objset_close+1d () Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112d20 zfs:dmu_recvbackup+5b5 () Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112d40 zfs:zfs_ioc_recvbackup+45 () Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112d80 zfs:zfsdev_ioctl+146 () Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112d90 genunix:cdev_ioctl+1d () Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112db0 specfs:spec_ioctl+50 () Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112de0 genunix:fop_ioctl+25 () Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112ec0 genunix:ioctl+ac () Aug 25 17:01:50 ldasdata6 genunix: [ID 655072 kern.notice] fe8001112f10 unix:sys_syscall32+101 () Aug 25 17:01:50 ldasdata6 unix: [ID 10 kern.notice] Aug 25 17:01:50 ldasdata6 genunix: [ID 672855 kern.notice] syncing file systems... Aug 25 17:01:51 ldasdata6 genunix: [ID 904073 kern.notice] done Aug 25 17:01:52 ldasdata6 genunix: [ID 111219 kern.notice] dumping to /dev/md/dsk/d3, offset 1645084672, content: kernel -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] chgrp -R hangs all writes to pool
On Tue, Jul 17, 2007 at 03:08:44PM +1000, James C. McPherson wrote: Log a new case with Sun, and make sure you supply a crash dump so people who know ZFS can analyze the issue. You can use stop-A sync, break sync, or reboot -dq That does appear to have caused a panic/kernel dump. However, I cannot find the dump image after rebooting to Solaris even thought savecore appears to be configured, # reboot -dq Jul 17 12:27:35 x4500gc reboot: rebooted by root panic[cpu2]/thread=9823c460: forced crash dump initiated at user request fe8000e18d60 genunix:kadmin+4b4 () fe8000e18ec0 genunix:uadmin+93 () fe8000e18f10 unix:sys_syscall32+101 () syncing file systems... 1 1 done dumping to /dev/md/dsk/d2, offset 3436511232, content: kernel 100% done: 3268790 pages dumped, compression ratio 12.39, dump succeeded rebooting... # dumpadm Dump content: kernel pages Dump device: /dev/md/dsk/d2 (swap) Savecore directory: /var/crash/x4500gc Savecore enabled: yes # ls -laR /var/crash/x4500gc/ /var/crash/x4500gc/: total 2 drwx-- 2 root root 512 Jul 12 16:26 . drwxr-xr-x 3 root root 512 Jul 12 16:26 .. Thanks. -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] chgrp -R hangs all writes to pool
It looks like there is a problem dumping a kernel panic on an X4500. During the self induced panic, there where additional syslog messages that indicate a problem writing to the two disks that make up /dev/md/dsk/d2 in my case. It is as if the SATA controllers are being reset during the crash dump. At any rate I will send this all to Sun support. Thanks. Jul 17 12:27:35 x4500gc unix: [ID 836849 kern.notice] Jul 17 12:27:35 x4500gc ^Mpanic[cpu2]/thread=9823c460: Jul 17 12:27:35 x4500gc genunix: [ID 156897 kern.notice] forced crash dump initiated at user request Jul 17 12:27:35 x4500gc unix: [ID 10 kern.notice] Jul 17 12:27:35 x4500gc genunix: [ID 655072 kern.notice] fe8000e18d60 genunix:kadmin+4b4 () Jul 17 12:27:35 x4500gc genunix: [ID 655072 kern.notice] fe8000e18ec0 genunix:uadmin+93 () Jul 17 12:27:35 x4500gc genunix: [ID 655072 kern.notice] fe8000e18f10 unix:sys_syscall32+101 () Jul 17 12:27:35 x4500gc unix: [ID 10 kern.notice] Jul 17 12:27:35 x4500gc genunix: [ID 672855 kern.notice] syncing file systems... Jul 17 12:27:35 x4500gc genunix: [ID 733762 kern.notice] 1 Jul 17 12:27:37 x4500gc last message repeated 1 time Jul 17 12:27:38 x4500gc genunix: [ID 904073 kern.notice] done Jul 17 12:27:39 x4500gc genunix: [ID 111219 kern.notice] dumping to /dev/md/dsk/d2, offset 3436511232, content: kernel Jul 17 12:27:39 x4500gc marvell88sx: [ID 812950 kern.warning] WARNING: marvell88sx3: error on port 0: Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] device disconnected Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] device connected Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] SError interrupt Jul 17 12:27:39 x4500gc marvell88sx: [ID 131198 kern.info] SErrors: Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] Recovered communication error Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] PHY ready change Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] 10-bit to 8-bit decode error Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] Disparity error Jul 17 12:27:39 x4500gc marvell88sx: [ID 812950 kern.warning] WARNING: marvell88sx3: error on port 4: Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] device disconnected Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] device connected Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] SError interrupt Jul 17 12:27:39 x4500gc marvell88sx: [ID 131198 kern.info] SErrors: Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] Recovered communication error Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] PHY ready change Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] 10-bit to 8-bit decode error Jul 17 12:27:39 x4500gc marvell88sx: [ID 517869 kern.info] Disparity error Jul 17 12:28:39 x4500gc genunix: [ID 409368 kern.notice] ^M100% done: 3268790 pages dumped, compression ratio 12.39, Jul 17 12:28:39 x4500gc genunix: [ID 851671 kern.notice] dump succeeded Jul 17 12:30:38 x4500gc genunix: [ID 540533 kern.notice] ^MSunOS Release 5.10 Version Generic_125101-10 64-bit Jul 17 12:30:38 x4500gc genunix: [ID 943907 kern.notice] Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved. On Tue, Jul 17, 2007 at 12:40:16PM -0700, Stuart Anderson wrote: On Tue, Jul 17, 2007 at 03:08:44PM +1000, James C. McPherson wrote: Log a new case with Sun, and make sure you supply a crash dump so people who know ZFS can analyze the issue. You can use stop-A sync, break sync, or reboot -dq That does appear to have caused a panic/kernel dump. However, I cannot find the dump image after rebooting to Solaris even thought savecore appears to be configured, # reboot -dq Jul 17 12:27:35 x4500gc reboot: rebooted by root panic[cpu2]/thread=9823c460: forced crash dump initiated at user request fe8000e18d60 genunix:kadmin+4b4 () fe8000e18ec0 genunix:uadmin+93 () fe8000e18f10 unix:sys_syscall32+101 () syncing file systems... 1 1 done dumping to /dev/md/dsk/d2, offset 3436511232, content: kernel 100% done: 3268790 pages dumped, compression ratio 12.39, dump succeeded rebooting... # dumpadm Dump content: kernel pages Dump device: /dev/md/dsk/d2 (swap) Savecore directory: /var/crash/x4500gc Savecore enabled: yes # ls -laR /var/crash/x4500gc/ /var/crash/x4500gc/: total 2 drwx-- 2 root root 512 Jul 12 16:26 . drwxr-xr-x 3 root root 512 Jul 12 16:26 .. Thanks. -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] chgrp -R hangs all writes to pool
in the output of dmesg, svcs -xv, or fmdump associated with this event. Is this a known issue or should I open a new case with Sun? Thanks. -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] chgrp -R hangs all writes to pool
On Tue, Jul 17, 2007 at 02:49:08PM +1000, James C. McPherson wrote: Stuart Anderson wrote: Running Solaris 10 Update 3 on an X4500 I have found that it is possible to reproducibly block all writes to a ZFS pool by running chgrp -R on any large filesystem in that pool. As can be seen below in the zpool iostat output below, after about 10-sec of running the chgrp command all writes to the pool stop, and the pool starts exclusively running a slow background task of 1kB reads. ... Is this a known issue or should I open a new case with Sun? Log a new case with Sun, and make sure you supply a crash dump so people who know ZFS can analyze the issue. You can use stop-A sync, break sync, or reboot -dq In previous attempts, neither halt -d nor reboot (with no arguments) where able to shutdown the machine. Is reboot -dq really a bigger hammer than halt -d? Sorry to be pedantic, but what is the exact key sequence on a Sun USB keyboard one should use to force a kernel dump on Solx86? Since there is no OBP on an X4500 where do I type the sync command? Thanks. -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss