[zfs-discuss] panic after zfs mount
Dear all We ran into a nasty problem the other day. One of our mirrored zpool hosts several ZFS filesystems. After a reboot (all FS mounted at that time an in use) the machine paniced (console output further down). After detaching one of the mirrors the pool fortunately imported automatically in a faulted state without mounting the filesystems. Offling the unplugged device and clearing the fault allowed us to disable auto-mounting the filesystems. Going through them one by one all but one mounted OK. The one again triggered a panic. We left mounting on that one disabled for now to be back in production after pulling data from the backup tapes. Scrubbing didn't show any error so any idea what's behind the problem? Any chance to fix the FS? Thomas --- panic[cpu3]/thread=ff0503498400: BAD TRAP: type=e (#pf Page fault) rp=ff001e937320 addr=20 occurred in module zfs due to a NULL pointer dereference zfs: #pf Page fault Bad kernel fault at addr=0x20 pid=27708, pc=0xf806b348, sp=0xff001e937418, eflags=0x10287 cr0: 8005003bpg,wp,ne,et,ts,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de cr2: 20cr3: 4194a7000cr8: c rdi: ff0503aaf9f0 rsi:0 rdx:0 rcx: 155cda0b r8: eaa325f0 r9: ff001e937480 rax: 7ff rbx:0 rbp: ff001e937460 r10: 7ff r11:0 r12: ff0503aaf9f0 r13: ff0503aaf9f0 r14: ff001e9375d0 r15: ff001e937610 fsb:0 gsb: ff04e7e5c040 ds: 4b es: 4b fs:0 gs: 1c3 trp:e err:0 rip: f806b348 cs: 30 rfl:10287 rsp: ff001e937418 ss: 38 ff001e937200 unix:die+dd () ff001e937310 unix:trap+177e () ff001e937320 unix:cmntrap+e6 () ff001e937460 zfs:zap_leaf_lookup_closest+40 () ff001e9374f0 zfs:fzap_cursor_retrieve+c9 () ff001e9375b0 zfs:zap_cursor_retrieve+19a () ff001e937780 zfs:zfs_purgedir+4c () ff001e9377d0 zfs:zfs_rmnode+52 () ff001e937810 zfs:zfs_zinactive+b5 () ff001e937860 zfs:zfs_inactive+ee () ff001e9378b0 genunix:fop_inactive+af () ff001e9378d0 genunix:vn_rele+5f () ff001e937ac0 zfs:zfs_unlinked_drain+af () ff001e937af0 zfs:zfsvfs_setup+fb () ff001e937b50 zfs:zfs_domount+16a () ff001e937c70 zfs:zfs_mount+1e4 () ff001e937ca0 genunix:fsop_mount+21 () ff001e937e00 genunix:domount+ae3 () ff001e937e80 genunix:mount+121 () ff001e937ec0 genunix:syscall_ap+8c () ff001e937f10 unix:brand_sys_sysenter+1eb () - GPG fingerprint: B1 EE D2 39 2C 82 26 DA A5 4D E0 50 35 75 9E ED ___ cifs-discuss mailing list cifs-disc...@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/cifs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Reconfiguring a RAID-Z dataset
Thomas Burgess wrote: Yeah, this is what I was thinking too... Is there anyway to retain snapshot data this way? I've read about the ZFS replay/mirror features, but my impression was that this was more so for a development mirror for testing rather than a reliable backup? This is the only way I know of that one could do something like this. Is there some other way to create a solid clone, particularly with a machine that won't have the same drive configuration? I recently used zfs send/recv to copy a bunch of datasets from a raidz2 box to a box made on mirrors. It works fine. ZFS send/recv looks very cool and very convenient. I wonder what it was that I read that suggested not relying on it for backups? Maybe this was alluding to the notion that like relying on RAID for a backup, if there is corruption your mirror (i.e. machine you are using with zfs recv) will be corrupted too? At any rate, thanks for answering this question! At some point if I go this route I'll test send and recv functionality to give all of this a dry run. -- Joe Auty, NetMusician NetMusician helps musicians, bands and artists create beautiful, professional, custom designed, career-essential websites that are easy to maintain and to integrate with popular social networks. www.netmusician.org j...@netmusician.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] panic after zfs mount
Thomas Nau wrote: Dear all We ran into a nasty problem the other day. One of our mirrored zpool hosts several ZFS filesystems. After a reboot (all FS mounted at that time an in use) the machine paniced (console output further down). After detaching one of the mirrors the pool fortunately imported automatically in a faulted state without mounting the filesystems. Offling the unplugged device and clearing the fault allowed us to disable auto-mounting the filesystems. Going through them one by one all but one mounted OK. The one again triggered a panic. We left mounting on that one disabled for now to be back in production after pulling data from the backup tapes. Scrubbing didn't show any error so any idea what's behind the problem? Any chance to fix the FS? We had the same problem. Victor pointed my to http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6742788 with a workaround to mount the filesystem read-only to save the data. I still hope to figure out the chain of events that causes this. Did you use any extended attributes on this filesystem? -- Arne ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] panic after zfs mount
Thanks for the link Arne. On 06/13/2010 03:57 PM, Arne Jansen wrote: Thomas Nau wrote: Dear all We ran into a nasty problem the other day. One of our mirrored zpool hosts several ZFS filesystems. After a reboot (all FS mounted at that time an in use) the machine paniced (console output further down). After detaching one of the mirrors the pool fortunately imported automatically in a faulted state without mounting the filesystems. Offling the unplugged device and clearing the fault allowed us to disable auto-mounting the filesystems. Going through them one by one all but one mounted OK. The one again triggered a panic. We left mounting on that one disabled for now to be back in production after pulling data from the backup tapes. Scrubbing didn't show any error so any idea what's behind the problem? Any chance to fix the FS? We had the same problem. Victor pointed my to http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6742788 with a workaround to mount the filesystem read-only to save the data. I still hope to figure out the chain of events that causes this. Did you use any extended attributes on this filesystem? -- Arne To my knowledge we haven't used any extended attributes but I'll double check after mounting the filesystem read-only. As it's one that's exported using Samba it might be indeed the case. For sure a lot of ACLs are used Thomas - GPG fingerprint: B1 EE D2 39 2C 82 26 DA A5 4D E0 50 35 75 9E ED ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Homegrown Hybrid Storage
On Jun 8, 2010, at 12:46 PM, Miles Nordin wrote: re == Richard Elling richard.ell...@gmail.com writes: re Please don't confuse Ethernet with IP. okay, but I'm not. seriously, if you'll look into it. [fine whine elided] I think we can agree that the perfect network has yet to be invented :-) Meanwhile, 6Gbps SAS switches are starting to hit the market... what fun :-) re The latest OpenSolaris release is 2009.06 which treats all re Zvol-backed COMSTAR iSCSI writes as sync. This was changed in re the developer releases in summer 2009, b114. For a release re such as NexentaStor 3.0.2, which is based on b140 (+/-), the re initiator's write cache enable/disable request is respected, re by default. that helps a little, but it's far from a full enough picture to be useful to anyone IMHO. In fact it's pretty close to ``it varies and is confusing'' which I already knew: * how do I control the write cache from the initiator? though I think I already know the answer: ``it depends on which initiator,'' and ``oh, you're using that one? well i don't know how to do it with THAT initiator'' == YOU DON'T For ZFS over a Solaris initiator, it is done with setting DKIOCSETWCE via an ioctl. Look on or near http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/vdev_disk.c#276 I presume that this can also be set with format -e, as is done for other devices. Has anyone else tried? * when the setting has been controlled, how long does it persist? Where can it be inspected? RTFM stmfadm(1m) and look for wcd small_rant drives me nuts that some people prefer negatives (disables) over positives (enables) /small_rant * ``by default'' == there is a way to make it not respect the initiator's setting, and through a target shell command cause it to use one setting or the other, persistently? See above. * is the behavior different for file-backed LUN's than zvol's? Yes, it can be. It can also be modified by the sync property. See CR 6794730, need zvol support for DKIOCSETWCE and friends http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6794730 I guess there is less point to figuring this out until the behavior is settled. I think it is settled, but perhaps not well documented :-( -- richard -- ZFS and NexentaStor training, Rotterdam, July 13-15, 2010 http://nexenta-rotterdam.eventbrite.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Moved disks to new controller - cannot import pool even after moving ba
I found a thread that mentions an undocumented parameter -V (http://opensolaris.org/jive/thread.jspa?messageID=444810) and that did the trick! The pool is now online and seems to be working well. Thanks everyone who helped! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] panic after zfs mount
Arne, On 06/13/2010 03:57 PM, Arne Jansen wrote: Thomas Nau wrote: Dear all We ran into a nasty problem the other day. One of our mirrored zpool hosts several ZFS filesystems. After a reboot (all FS mounted at that time an in use) the machine paniced (console output further down). After detaching one of the mirrors the pool fortunately imported automatically in a faulted state without mounting the filesystems. Offling the unplugged device and clearing the fault allowed us to disable auto-mounting the filesystems. Going through them one by one all but one mounted OK. The one again triggered a panic. We left mounting on that one disabled for now to be back in production after pulling data from the backup tapes. Scrubbing didn't show any error so any idea what's behind the problem? Any chance to fix the FS? We had the same problem. Victor pointed my to http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6742788 with a workaround to mount the filesystem read-only to save the data. I still hope to figure out the chain of events that causes this. Did you use any extended attributes on this filesystem? -- Arne Mounting the FS read-only worked, thanks again. I checked the attributes and the set for all files is: {archive,nohidden,noreadonly,nosystem,noappendonly,nonodump,noimmutable,av_modified,noav_quarantined,nonounlink} so just the default ones Thomas - GPG fingerprint: B1 EE D2 39 2C 82 26 DA A5 4D E0 50 35 75 9E ED ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Moved disks to new controller - cannot import pool even after moving ba
On Jun 13, 2010, at 8:09 AM, Jan Hellevik wrote: I found a thread that mentions an undocumented parameter -V (http://opensolaris.org/jive/thread.jspa?messageID=444810) and that did the trick! The pool is now online and seems to be working well. -V is a crutch, not a cure. -- richard -- ZFS and NexentaStor training, Rotterdam, July 13-15, 2010 http://nexenta-rotterdam.eventbrite.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Moved disks to new controller - cannot import pool even after moving ba
Well, for me it was a cure. Nothing else I tried got the pool back. As far as I can tell, the way to get it back should be to use symlinks to the fdisk partitions on my SSD, but that did not work for me. Using -V got the pool back. What is wrong with that? If you have a better suggestion as to how I should have recovered my pool I am certainly interested in hearing it. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Moved disks to new controller - cannot import pool even after moving ba
On 6/13/2010 11:14 AM, Jan Hellevik wrote: Well, for me it was a cure. Nothing else I tried got the pool back. As far as I can tell, the way to get it back should be to use symlinks to the fdisk partitions on my SSD, but that did not work for me. Using -V got the pool back. What is wrong with that? If you have a better suggestion as to how I should have recovered my pool I am certainly interested in hearing it. I think Richard meant that -V isn't the real solution to your problem, which is to fix the underlying issue with fdisk partition recognition. -V is undocumented for a reason, and likely to go poof and disappear at any time, so we shouldn't rely on it to solve this issue. For you, though, it obviously worked in this case. The message being, don't count on this being a general solution if you get in this situation again. But at least you're back in business. :-) -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Moved disks to new controller - cannot import pool even after moving ba
On Jun 13, 2010, at 12:38 PM, Erik Trimble wrote: On 6/13/2010 11:14 AM, Jan Hellevik wrote: Well, for me it was a cure. Nothing else I tried got the pool back. As far as I can tell, the way to get it back should be to use symlinks to the fdisk partitions on my SSD, but that did not work for me. Using -V got the pool back. What is wrong with that? If you have a better suggestion as to how I should have recovered my pool I am certainly interested in hearing it. I think Richard meant that -V isn't the real solution to your problem, which is to fix the underlying issue with fdisk partition recognition. yes. -V is undocumented for a reason, and likely to go poof and disappear at any time, so we shouldn't rely on it to solve this issue. I wouldn't worry about it going away, but the option is only useful when a vdev is missing. It does not cause a missing vdev to reappear, so your problem is not solved. -- richard -- ZFS and NexentaStor training, Rotterdam, July 13-15, 2010 http://nexenta-rotterdam.eventbrite.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zpool import not working
Thank you. The -D option works. And yes, now I feel a lot more confident about playing around with the FS. I'm planning on moving an existing raid1 NTFS setup to ZFS, but since I'm on a budget I only have three drive in total to work with. I want to make sure I know what I'm doing before I mess around with anything. Also I can confirm that the cache flush option is not ALWAYS needed for the import. I have opensolaris build 134 in VirtualBox, but I didn't enable cache flush. After destroying the import worked correctly with the -D option. I emphasize always because if you are writing to the disk, while you destroy it, it may not work very well; I haven't tested this. Thanks for your help. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Unable to Install 2009.06 on BigAdmin Approved MOBO - FILE SYSTEM FULL
Hi Guys I am having trouble installing Opensolaris 2009.06 into my Biostar Tpower I45 motherboard, approved on BigAdmin HCL here: http://www.sun.com/bigadmin/hcl/data/systems/details/26409.html -- why is it not working? My setup: 3x 1TB hard-drives SATA 1x 500GB hard-drive (I have only left this hdd connected to try to isolate the issue, still happens) 4GB DDR2 PC2-6400 Ram (tested GOOD!) ATI Radeon 4650 512MB DDR2 PCI-E 16x Motherboard default settings/CMOS cleared Here's what happens: Opensolaris boot options come up, I choose the first default OPensolaris 2009.06 -- I HAVE ALSO TRIED VESA DRIVES and Command line, all of these fail. - After Select desktop language, configuring devices. Mounting cdroms Reading ZFS Config: done. opensolaris console login: (cd rom is still being accessed at this time).. few seconds later: then opensolaris ufs: NOTICE: alloc: /: file system full opensolaris last message repeated 1 time opensolaris syslogd: /var/adm/messages: No space left on device opensolaris in.routed[537]: route 0.0.0.0/8 - 0.0.0.0 nexthop is not directly connected --- I logged in as jack / jack on the console and did a df -h /devices/ramdisk:a = size 164M 100% used mount / swap 3.3GB used 860K 1% /mnt/misc/opt 210MB used 210M 100% /mnt/misc /usr/lib/libc/libc_hwcap1.so.1 2.3G used 2.3G 100% /lib/libc.so.1 /dev/dsk/c7t0d0s2 677M used 677M 100% /media/OpenSolaris Thanks for any help! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Dedup performance hit
Hello, I tried enabling dedup on a filesystem, and moved files into it to take advantage of it. I had about 700GB of files and left it for some hours. When I returned, only 70GB were moved. I checked zpool iostat, and it showed about 8MB/s R/W performance (the old and new zfs filesystems are in the same pool). So I disabled dedup for a few seconds and instantly the performance jumped to 80MB/s It's Athlon64 x2 machine with 4GB RAM, it's only a fileserver (4x1TB SATA for ZFS). arcstat.pl shows 2G for arcsz, top shows 13% CPU during the 8MB/s transfers. Is this normal behavior? Should I always expect such low performance, or is there anything wrong with my setup? Thanks in advance, Hernan -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup performance hit
Hernan F wrote: Hello, I tried enabling dedup on a filesystem, and moved files into it to take advantage of it. I had about 700GB of files and left it for some hours. When I returned, only 70GB were moved. I checked zpool iostat, and it showed about 8MB/s R/W performance (the old and new zfs filesystems are in the same pool). So I disabled dedup for a few seconds and instantly the performance jumped to 80MB/s It's Athlon64 x2 machine with 4GB RAM, it's only a fileserver (4x1TB SATA for ZFS). arcstat.pl shows 2G for arcsz, top shows 13% CPU during the 8MB/s transfers. Is this normal behavior? Should I always expect such low performance, or is there anything wrong with my setup? Thanks in advance, Hernan You are severely RAM limited. In order to do dedup, ZFS has to maintain a catalog of every single block it writes and the checksum for that block. This is called the Dedup Table (DDT for short). So, during the copy, ZFS has to (a) read a block from the old filesystem, (b) check the current DDT to see if that block exists and (c) either write the block to the new filesytem (and add an appropriate DDT entry for it), or write a metadata update with the dedup reference block reference. Likely, you have two problems: (1) I suspect your source filesystem has lots of blocks (that is, it's likely made up smaller-sized files). Lots of blocks means lots of seeking back and forth to read all those blocks. (2) Lots of blocks also means lots of entries in the DDT. It's trivial to overwhelm a 4GB system with a large DDT. If the DDT can't fit in RAM, then it has to get partially refreshed from disk. Thus, here's what's likely going on: (1) ZFS reads a block and it's checksum from the old filesystem (2) it checks the DDT to see if that checksum exists (3) finding that the entire DDT isn't resident in RAM, it starts a cycle to read the rest of the (potential) entries from the new filesystems' metadata. That is, it tries to reconstruct the DDT from disk. Which involves a HUGE amount of random seek reads on the new filesystem. In essence, since you likely can't fit the DDT in RAM, each block read from the old filesystem forces a flurry of reads from the new filesystem. Which eats up the IOPS that your single pool can provide. It thrashes the disks. Your solution is to either buy more RAM, or find something you can use as an L2ARC cache device for your pool. Ideally, it would be an SSD. However, in this case, a plain hard drive would do OK (NOT one already in a pool).To add such a device, you would do: 'zpool add tank mycachedevice' -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup performance hit
Howdy all, I too dabbled with dedup and found the performance poor with only 4gb ram. I've since disabled dedup and find the performance better but zpool list still shows a 1.15x dedup ratio. Is this still a hit on disk io performance? Aside from copying the data off and back onto the filesystem, is there another way to de-dedup the pool? Thanks, John On Jun 13, 2010, at 10:17 PM, Erik Trimble wrote: Hernan F wrote: Hello, I tried enabling dedup on a filesystem, and moved files into it to take advantage of it. I had about 700GB of files and left it for some hours. When I returned, only 70GB were moved. I checked zpool iostat, and it showed about 8MB/s R/W performance (the old and new zfs filesystems are in the same pool). So I disabled dedup for a few seconds and instantly the performance jumped to 80MB/s It's Athlon64 x2 machine with 4GB RAM, it's only a fileserver (4x1TB SATA for ZFS). arcstat.pl shows 2G for arcsz, top shows 13% CPU during the 8MB/s transfers. Is this normal behavior? Should I always expect such low performance, or is there anything wrong with my setup? Thanks in advance, Hernan You are severely RAM limited. In order to do dedup, ZFS has to maintain a catalog of every single block it writes and the checksum for that block. This is called the Dedup Table (DDT for short). So, during the copy, ZFS has to (a) read a block from the old filesystem, (b) check the current DDT to see if that block exists and (c) either write the block to the new filesytem (and add an appropriate DDT entry for it), or write a metadata update with the dedup reference block reference. Likely, you have two problems: (1) I suspect your source filesystem has lots of blocks (that is, it's likely made up smaller-sized files). Lots of blocks means lots of seeking back and forth to read all those blocks. (2) Lots of blocks also means lots of entries in the DDT. It's trivial to overwhelm a 4GB system with a large DDT. If the DDT can't fit in RAM, then it has to get partially refreshed from disk. Thus, here's what's likely going on: (1) ZFS reads a block and it's checksum from the old filesystem (2) it checks the DDT to see if that checksum exists (3) finding that the entire DDT isn't resident in RAM, it starts a cycle to read the rest of the (potential) entries from the new filesystems' metadata. That is, it tries to reconstruct the DDT from disk. Which involves a HUGE amount of random seek reads on the new filesystem. In essence, since you likely can't fit the DDT in RAM, each block read from the old filesystem forces a flurry of reads from the new filesystem. Which eats up the IOPS that your single pool can provide. It thrashes the disks. Your solution is to either buy more RAM, or find something you can use as an L2ARC cache device for your pool. Ideally, it would be an SSD. However, in this case, a plain hard drive would do OK (NOT one already in a pool).To add such a device, you would do: 'zpool add tank mycachedevice' -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss