Re: [zfs-discuss] new labelfix needed
your point have only a rethoric meaning. System breaks regardless the ressource you put to build it. Bad hardware, typo, human mistakes, bugs, This mailing-list is full of examples. Having some tools like zdb, mdb, zfs import -fFX and labelfix for analyzis and repair is always a good thing. BTW zfsck would be a great improvement to ZFS. bbr -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Algorithm of block size estimation
Hi, Can anybody help me give the link on the code snippet of block size estimation? I want to know when ZFS makes a decision on the block size used for a file. Does ZFS estimate it based on the length of file when the create event of file is committed to disk during txg commit? If so, is the block size of a file changed during the lifetime of the file? I suppose the answer is no. Thank you for your kindly help! best regards, hanzhu ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Algorithm of block size estimation
On 02/09/2010 11:18, Zhu Han wrote: Can anybody help me give the link on the code snippet of block size estimation? See the zfs_write() function. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Algorithm of block size estimation
Thank you! Here is my understanding, leave it here as reference for others. If it's not correct, please point it out. ZFS estimates the size of block only when the file only has single block and it is extended. This is because dmu_object_set_blocksize() only set the block size when the object contains a single block. Once the file is beyond single block, the block size is fixed and never be shrinked. ZFS always sets the block size as the power 2 ceil of file size. The largest block size is specified by the file system property. One special case is if the current file block size is not power 2, it's new block size can be greater than the file system recordsize property. Does anybody know why there is such a short circuit? best regards, hanzhu On Thu, Sep 2, 2010 at 6:30 PM, Darren J Moffat darr...@opensolaris.orgwrote: On 02/09/2010 11:18, Zhu Han wrote: Can anybody help me give the link on the code snippet of block size estimation? See the zfs_write() function. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] new labelfix needed
On Wed, 1 Sep 2010, Benjamin Brumaire wrote: your point have only a rethoric meaning. I'm not sure what you mean by that. I was asking specifically about your situation. You want to run labelfix on /dev/rdsk/c0d1s4 - what happened to that slice that requires a labelfix? Is there something that zfs might be doing to cause the problem? Is there something that zfs could be doing to mitigate the problem? BTW zfsck would be a great improvement to ZFS. What specifically would zfsck do that is not done by scrub? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] pool died during scrub
looks similar to a crash I had here at our site a few month ago. Same symptoms, no actual solution. We had to recover from a rsync backup server. Thanks Carsten. And on Sun hardware, too. Boy, that's comforting Three way mirrors anyone? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] what is zfs doing during a log resilver?
So, when you add a log device to a pool, it initiates a resilver. What is it actually doing, though? Isn't the slog a copy of the in-memory intent log? Wouldn't it just simply replicate the data that's in the other log, checked against what's in RAM? And presumably there isn't that much data in the slog so there isn't that much to check? Or is it just doing a generic resilver for the sake of argument because you changed something? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] What forum to use with a ZFS how-to question
Is this the right forum to post a zfs how-to question? If not, what are your suggestions, which forum I should go to? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What forum to use with a ZFS how-to question
This is the right forum, fire away... Feel free to review ZFS information in advance: http://hub.opensolaris.org/bin/view/Community+Group+zfs/docs ZFS Administration Guide (Solaris 10): http://docs.sun.com/app/docs/doc/819-5461 ZFS Best Practices Guide: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide cs On 09/02/10 07:42, Dominik Hoffmann wrote: Is this the right forum to post a zfs how-to question? If not, what are your suggestions, which forum I should go to? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 4k block alignment question (X-25E)
On Tue, Aug 31, 2010 at 12:47:49PM -0700, Brandon High wrote: On Mon, Aug 30, 2010 at 3:05 PM, Ray Van Dolson rvandol...@esri.com wrote: I want to fix (as much as is possible) a misalignment issue with an X-25E that I am using for both OS and as an slog device. It's pretty easy to get the alignment right fdisk uses a default of 63/255/*, which isn't easy to change. This makes each cylinder ( 63 * 255 * 512b ). You want ( $cylinder_offset ) * ( 63 * 255 * 512b ) / ( $block_alignment_size ) to be evenly divisible. For a 4k alignment you want the offset to be 8. With fdisk, create your SOLARIS2 partition that uses the entire disk. The partition will be from cylinder 1 to whatever. Cylinder 0 is used for the MBR, so it's automatically un-aligned. When you create slices in format, the MBR cylinder isn't visible, so you have to subtract 1 from the offset, so your first slice should start on cylinder 7. Each additional cylinder should start on a multiple of 8, minus 1. eg: 63, 1999, etc. It doesn't matter if the end of a slice is unaligned, other than to make aligning the next slice easier. -B Thanks Brandon. Just a follow-up to my original post... unfortunately I couldn't try aligning the slice on the SSD I was also using for slog/ZIL. The slog/ZIL slice was too small to be added to the ZIL mirror as the disk we'd thrown in the system bypassing the expander was being used completely (via EFI label). Still wanted to test, however, so I pulled one of the drives from my rpool, and added the entire disk to my mirror. This uses the EFI label and aligns everything correctly. Unit Attention errors immediately began showing up. I pulled that drive from the ZIL mirror and then used one of my two L2ARC drives (also X-25E's) in the same fashion. Same problem. So I believe the problem is still expander related moreso than alignment related. Too bad. Ray ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to rebuild raidz after system reinstall
What does 'zpool import' show? If that's empty, what about 'zpool import -d /dev'? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to rebuild raidz after system reinstall
I just tried admin$ zpool replace BackupRAID /dev/disk0 /dev/disk1 /dev/disk2 too many arguments As you can see, it didn't do what I need to accomplish. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to rebuild raidz after system reinstall
I think, I just destroyed the information on the old raidz members by doing zpool create BackupRAID raidz /dev/disk0s2 /dev/disk1s2 /dev/disk2s2 The pool mounted fine after that, but is empty. None of the old information is present. Am I right? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] new labelfix needed
my company changed our SAN and I migrated our zpool to the new luns with attach/detach while the zpool stayed online. Once finished we had a cluster crash and the zpool (on new luns) get corrupted. No way to import it, zpool import -fFX failed. The old luns are detached and probably sane from a data point of view... and I 've already successfully recovered a detached disk with the help from Jeff Bonwick. I guess zfsck or zpoolck could help when zpool can't be imported. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to rebuild raidz after system reinstall
On Thu, 2 Sep 2010, Dominik Hoffmann wrote: I think, I just destroyed the information on the old raidz members by doing zpool create BackupRAID raidz /dev/disk0s2 /dev/disk1s2 /dev/disk2s2 It should have warned you that two of the disks were already formatted with a zfs pool. Did it not do that? If so, perhaps these aren't the same disks you were using in your pool. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to rebuild raidz after system reinstall
There was no warning. This is the output: admin$ sudo zpool create BackupRAID raidz /dev/disk0s2 /dev/disk1s2 /dev/disk2s2 Password: admin$ zpool status pool: BackupRAID state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scrub: none requested config: NAME STATE READ WRITE CKSUM BackupRAID ONLINE 0 0 0 raidz1 ONLINE 0 0 0 disk0s2 ONLINE 0 0 0 disk1s2 ONLINE 0 0 0 disk2s2 ONLINE 0 0 0 errors: No known data errors -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to rebuild raidz after system reinstall
I can only tell you what it is now: admin$ zpool import no pools available to import admin$ zpool import -d /dev no pools available to import -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to rebuild raidz after system reinstall
Also, I am quite sure that I was using the actual drives from the old raidz. This is my drive listing: admin$ diskutil list /dev/disk0 ___#:___TYPE NAMESIZE___IDENTIFIER ___0:__GUID_partition_scheme*465.8 Gi___disk0 ___1:EFI_200.0 Mi___disk0s1 ___2:ZFS BackupRAID__465.4 Gi___disk0s2 /dev/disk1 ___#:___TYPE NAMESIZE___IDENTIFIER ___0:__GUID_partition_scheme*465.8 Gi___disk1 ___1:EFI_200.0 Mi___disk1s1 ___2:ZFS BackupRAID__465.4 Gi___disk1s2 /dev/disk2 ___#:___TYPE NAMESIZE___IDENTIFIER ___0:__GUID_partition_scheme*465.8 Gi___disk2 ___1:EFI_200.0 Mi___disk2s1 ___2:ZFS_465.4 Gi___disk2s2 /dev/disk2 is the new drive. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Information lost? Does zpool create erase volumes
Those of you who have read my previous post know that I was trying to reassemble a raidz after a complete reinstall of the OS on a Mac running zfs-119. In a fit of impatience, I executed the zpool create command on the three volumes, two of which were part of the old raidz, the third one having replaced the third old one, which had died. This did succeed in creating a raidz on my system, but all the original information is gone, leading me to believe that I caused the old information to be erased or at least inaccessible. Can any of you please clue me in? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Information lost? Does zpool create erase volumes
Dominik, You overwrite your data when you recreated a pool with the same name and the same disks with zpool create. If I try to recreate a pool that already exists, at least exported, I will see a message similar to the following: # zpool create tank c3t3d0 invalid vdev specification use '-f' to override the following errors: /dev/dsk/c3t3d0s0 is part of exported or potentially active ZFS pool tank. Please see zpool(1M). Did you attempt to import the pool prior to the zpool create? Thanks, Cindy On 09/02/10 14:17, Dominik Hoffmann wrote: Those of you who have read my previous post know that I was trying to reassemble a raidz after a complete reinstall of the OS on a Mac running zfs-119. In a fit of impatience, I executed the zpool create command on the three volumes, two of which were part of the old raidz, the third one having replaced the third old one, which had died. This did succeed in creating a raidz on my system, but all the original information is gone, leading me to believe that I caused the old information to be erased or at least inaccessible. Can any of you please clue me in? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Information lost? Does zpool create erase volumes
Yes, I did try to import the pool. However, the response of the command was no pools available to import. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Information lost? Does zpool create erase volumes
Yes, I did try to import the pool. However, the response of the command was no pools available to import. I'm not sure what happened to your pool, but I think it is possible that the pool information on these disks was removed accidentally. I'm not sure what the diskutil command does but if it was run on the wrong disk, then possibly all the pool info was removed from the disks' labels. This is just a guess. Then, when you reissued the zpool create command, it didn't complain that the disks were already in use because the pool info was already removed. Cindy -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] possible ZFS-related panic?
Folks, Has anyone seen a panic traceback like the following? This is Solaris-10u7 on a Thumper, acting as an NFS server. The machine was up for nearly a year, I added a dataset to an existing pool, set compression=on for the first time on this system, loaded some data in there (via rsync), then mounted it to the NFS client. The first data was written by the client itself in a 10pm cron-job, and the system crashed at 10:02pm as below: panic[cpu2]/thread=fe8000f5cc60: page_sub: bad arg(s): pp 872b5610, *ppp 0 fe8000f5c470 unix:mutex_exit_critical_size+20219 () fe8000f5c4b0 unix:page_list_sub_pages+161 () fe8000f5c510 unix:page_claim_contig_pages+190 () fe8000f5c600 unix:page_geti_contig_pages+44b () fe8000f5c660 unix:page_get_contig_pages+c2 () fe8000f5c6f0 unix:page_get_freelist+1a4 () fe8000f5c760 unix:page_create_get_something+95 () fe8000f5c7f0 unix:page_create_va+2a1 () fe8000f5c850 unix:segkmem_page_create+72 () fe8000f5c8b0 unix:segkmem_xalloc+60 () fe8000f5c8e0 unix:segkmem_alloc_vn+8a () fe8000f5c8f0 unix:segkmem_alloc+10 () fe8000f5c9c0 genunix:vmem_xalloc+315 () fe8000f5ca20 genunix:vmem_alloc+155 () fe8000f5ca90 genunix:kmem_slab_create+77 () fe8000f5cac0 genunix:kmem_slab_alloc+107 () fe8000f5caf0 genunix:kmem_cache_alloc+e9 () fe8000f5cb00 zfs:zio_buf_alloc+1d () fe8000f5cb50 zfs:zio_compress_data+ba () fe8000f5cba0 zfs:zio_write_compress+78 () fe8000f5cbc0 zfs:zio_execute+60 () fe8000f5cc40 genunix:taskq_thread+bc () fe8000f5cc50 unix:thread_start+8 () syncing file systems... done . . . Unencumbered by more than a gut feeling, I disabled compression on the dataset, and we've gotten through two nightly runs of the same NFS client job without crashing, but of course we would tecnically have to wait for nearly a year before we've exactly replicated the original situation (:-). Unfortunately the dump-slice was slightly too small, we were just short of enough space to capture the whole 10GB crash dump. I did get savecore to write something out, and I uploaded it to the Oracle support site,but it gives scat too much indigestion to be useful to the engineer I'm working with. They have not found any matching bugs so far, so I thought I'd ask a slightly wider audience here. Thanks and regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss