Re: [zfs-discuss] zil and root on the same SSD disk
Whenever I do a root pool, ie, configure a pool using the c?t?d?s0 notation, it will always complain about overlapping slices, since *s2 is the entire disk. This warning seems excessive, but -f will ignore it. As for ZIL, the first time I created a slice for it. This worked well, the second time I did: # zfs create -V 2G rpool/slog # zfs set refreservation=2G rpool/slog NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c9d0s0ONLINE 0 0 0 pool: zpool state: ONLINE config: NAMESTATE READ WRITE CKSUM zpool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c8t0d0 ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 c8t4d0 ONLINE 0 0 0 logs /dev/zvol/dsk/rpool/slog ONLINE 0 0 0 Which I prefer now, as I can potentially change it size and reboot, compared to slices that are much more static. Don't know how it compares performance wise, but right now the NAS is fast enough (the nic is the slowest part). -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Mirroring raidz ?
I have a server, with two external drive cages attached, on separate controllers: c0::dsk/c0t0d0 disk connectedconfigured unknown c0::dsk/c0t1d0 disk connectedconfigured unknown c0::dsk/c0t2d0 disk connectedconfigured unknown c0::dsk/c0t3d0 disk connectedconfigured unknown c0::dsk/c0t4d0 disk connectedconfigured unknown c0::dsk/c0t5d0 disk connectedconfigured unknown c0::dsk/c0t6d0 disk connectedconfigured unknown c0::dsk/c0t7d0 disk connectedconfigured unknown c0::dsk/c0t8d0 disk connectedconfigured unknown c0::dsk/c0t9d0 disk connectedconfigured unknown c0::dsk/c0t10d0disk connectedconfigured unknown c0::dsk/c0t11d0disk connectedconfigured unknown c1::dsk/c1t1d0 disk connectedconfigured unknown c1::dsk/c1t2d0 disk connectedconfigured unknown c1::dsk/c1t3d0 disk connectedconfigured unknown c1::dsk/c1t4d0 disk connectedconfigured unknown c1::dsk/c1t5d0 disk connectedconfigured unknown c1::dsk/c1t6d0 disk connectedconfigured unknown c1::dsk/c1t7d0 disk connectedconfigured unknown c1::dsk/c1t8d0 disk connectedconfigured unknown c1::dsk/c1t9d0 disk connectedconfigured unknown c1::dsk/c1t10d0disk connectedconfigured unknown c1::dsk/c1t11d0disk connectedconfigured unknown It would be nice to create a setup similar to zpool create sub1 raidz c0t0d0 c0t1d0 c0t2d0 c0t3d0 c0t4d0 c0t5d0 zpool add sub1 raidz c0t6d0 c0t7d0 c0t8d0 c0t9d0 c0t10d0 c0t11d0 zpool create sub2 raidz c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0 c1t5d0 zpool add sub2 raidz c1t6d0 c1t7d0 c1t8d0 c1t9d0 c1t10d0 c1t11d0 zpool create pool mirror sub1 sub2 As I could lose a HDD in either external drive cage, or indeed a whole external drive cage (controller/cable/power) without downtime. But I have a feeling I can not do this? What would be recommended? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS panic on blade BL465c G1
Hello list, I got a c7000 with BL465c G1 blades to play with and have been trying to get some form of Solaris to work on it. However, this is the state: OpenSolaris 134: Installs with ZFS, but no BNX nic drivers. OpenIndiana 147: Panics on zpool create everytime, even from console. Has no UFS option, has NICS. Solaris 10 u9: Panics on zpool create, but has UFS option, has NICs. One option would be to get 147 NIC drivers for 134. But for now, the ZFS panics happens on create. The blade has a HP Smart Array e200i card in it, both HDDs set up as single HDD logical volumes; # format AVAILABLE DISK SELECTIONS: 0. c1t0d0 DEFAULT cyl 17841 alt 2 hd 255 sec 63 /p...@2,0/pci1166,1...@11/pci1166,1...@0/pci103c,3...@8/s...@0,0 1. c1t1d0 DEFAULT cyl 17841 alt 2 hd 255 sec 63 /p...@2,0/pci1166,1...@11/pci1166,1...@0/pci103c,3...@8/s...@1,0 1; p; p Total disk cylinders available: 17841 + 2 (reserved cylinders) Part TagFlag Cylinders SizeBlocks 0 rootwm 1 - 17840 136.66GB(17840/0/0) 286599600 1 unassignedwu 00 (0/0/0) 0 2 backupwm 0 - 17840 136.67GB(17841/0/0) 286615665 3 unassignedwm 00 (0/0/0) 0 4 unassignedwm 00 (0/0/0) 0 5 unassignedwm 00 (0/0/0) 0 6 unassignedwm 00 (0/0/0) 0 7 unassignedwm 00 (0/0/0) 0 8 bootwu 0 - 07.84MB(1/0/0) 16065 # zpool create -f zboot c1t1d0s0 panic[cpu2]/thread=fe80011a2c60: BAD TRAP: type=e (#pf Page fault) rp=fe80011a2940 addr=278 occurred in modul e unix due to a NULL pointer dereference sched: #pf Page fault Bad kernel fault at addr=0x278 pid=0, pc=0xfb8406fb, sp=0xfe80011a2a38, eflags=0x10246 cr0: 8005003bpg,wp,ne,et,ts,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de cr2: 278 cr3: 1161f000 cr8: c rdi: 278 rsi:4 rdx: fe80011a2c60 rcx: 14 r8:0 r9:0 rax:0 rbx: 278 rbp: fe80011a2a60 r10:0 r11:1 r12: 10 r13:0 r14:4 r15: 9bb02ef0 fsb:0 gsb: 8ac7a800 ds: 43 es: 43 fs:0 gs: 1c3 trp:e err:2 rip: fb8406fb cs: 28 rfl:10246 rsp: fe80011a2a38 ss: 30 fe80011a2850 unix:die+da () fe80011a2930 unix:trap+5e6 () fe80011a2940 unix:cmntrap+140 () fe80011a2a60 unix:mutex_enter+b () fe80011a2a70 zfs:zio_buf_alloc+1d () fe80011a2aa0 zfs:zio_vdev_io_start+120 () fe80011a2ad0 zfs:zio_execute+7b () fe80011a2af0 zfs:zio_nowait+1a () fe80011a2b60 zfs:vdev_probe+f0 () fe80011a2ba0 zfs:vdev_open+2b1 () fe80011a2bc0 zfs:vdev_open_child+21 () fe80011a2c40 genunix:taskq_thread+295 () fe80011a2c50 unix:thread_start+8 () syncing file systems... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] opensolaris lightweight install
On my NAS I use Velitium: http://sourceforge.net/projects/velitium/ which goes down to about 70MB at the smallest. (2010/01/07 15:23), Frank Cusack wrote: been searching and searching ... -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] quotas on zfs at solaris 10 update 9 (10/09)
Len Zaifman wrote: Because we have users who will create millions of files in a directory it would be nice to report the number of files a user has or a group has in a filesystem. Is there a way (other than find) to get this? I don't know if there is a good way, but I have noticed that with ZFS, the number in ls which used to be for blocks actually report the number of entries in the directory (-1). drwxr-xr-x 13 root bin 13 Oct 28 02:58 spool ^^ # ls -la spool | wc -l 14 Which means you can probably add things up a little faster. -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing log with SSD on Sol10 u8
Ok the logfix program compiled for svn111 does run, and lets me change the HDD 32GB slog, with the new SSD (~29GB) slog, comes up as faulty, but I can replace it with itself, and everything is OK. I can attach the second SSD without issues. Assuming that it doesn't try to write the full 32GB ever, it should be ok. Don't know if ZPOOL stores the physical size in the label, or when importing. # zpool export zpool1 # ./logfix /dev/rdsk/c5t1d0s0 /dev/rdsk/c10t4d0s0 13049515403703921770 # zpool import zpool1 # zpool status logs 13049515403703921770 FAULTED 0 0 0 was /dev/dsk/c10t4d0s0 # zpool replace -f zpool1 13049515403703921770 c10t4d0 # zpool status logs c10t4d0ONLINE 0 0 0 # zpool attach zpool1 c10t4d0 c9t4d0 logs mirror-1 ONLINE 0 0 0 c10t4d0 ONLINE 0 0 0 c9t4d0 ONLINE 0 0 0 And back in Solaris 10 u8: # zpool import zpool1 # zpool status logs mirrorONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 It does at least have a solution, even if it is rather unattractive. 12 servers, and has to be done at 2am means I will be testy for a while. Lund Jorgen Lundman wrote: Interesting. Unfortunately, I can not zpool offline, nor zpool detach, nor zpool remove the existing c6t4d0s0 device. I thought perhaps we could boot something newer than b125 [*1] and I would be able to remove the slog device that is too big. The dev-127.iso does not boot [*2] due to splashimage, so I had to edit the ISO to remove that for booting. After booting with -B console=ttya, I find that it can not add the /dev/dsk entries for the 24 HDDs, since / is on a too-small ramdisk. Disk-full messages ensue. Yay! After I have finally imported the pools, without upgrading (since I have to boot back to Sol 10 u8 for production), I attempt to remove the slog that is no longer needed: # zpool remove zpool1 c6t4d0s0 cannot remove c6t4d0s0: pool must be upgrade to support log removal Sigh. Lund [*1] http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6574286 [*2] http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6739497 -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] rquota didnot show userquota (Solaris 10)
Hopefully this will confirm to you that it should work: x4500-10:~# zfs get userqu...@57564 zpool1/sd02_www NAME PROPERTY VALUESOURCE zpool1/sd02_www userqu...@57564 29.5Glocal prov01# df -h |grep sd02 x4500-10.unix:/export/sd02/www16T 839G15T 6%/export/sd02/www remote/read/write/setuid/devices/vers=4/hard/intr/quota/xattr/dev=4700038 # quota -v 57564 Disk quotas for (no account) (uid 57564): Filesystem usage quota limittimeleft files quota limittimeleft /export/sd02/www 15 30932992 30932992 0 0 0 I would suggest the usual things to check: online Aug_20 svc:/network/nfs/rquota:default If you are using NFSv4, check /var/run/nfs4_domain match, we mount the volumes with -o rq (due to legacy reasons, it was UFS at one time) but I don't know if that is still required. Willi Burmeister wrote: Hi, we have a new fileserver running on X4275 hardware with Solaris 10U8. On this fileserver we created one test dir with quota and mounted these on another Solaris 10 system. Here the quota command didnot show the used quota. Does this feature only work with OpenSolaris or is it intended to work on Solaris 10? Here what we did on the server: # zfs create -o mountpoint=/export/home2 zpool1/home # zfs set sharenfs=rw=sparcs zpool1/home # zfs set userqu...@wib=1m zpool1/home # mkdir /export/home2/wib # cpsome stuff /export/home2/wib # chown -Rh wib:sysadmin /export/home2/wib # zfs userspace zpool1/home TYPENAMEUSED QUOTA POSIX User root 3K none POSIX User wib 154K 1M # quota -v wib Disk quotas for wib (uid 90): Filesystem usage quota limittimeleft files quota limittimeleft /export/home2 154 1024 1024 - - - - - and the client: # mountserver:/export/home2/wib /mnt % cd /mnt % du -sk . 154 . % quota -v wib Disk quotas for wib (uid 90): Filesystem usage quota limittimeleft files quota limittimeleft A simple snoop on the network shows us: client - server PORTMAP C GETPORT prog=100011 (RQUOTA) vers=1 proto=UDP server - client PORTMAP R GETPORT port=32865 client - server RQUOTA C GETQUOTA Uid=90 Path=/export/home2/wib server - client RQUOTA R GETQUOTA No quota Why 'no quota'? Both systems are nearly fully patched. Any help is appreciated. Thanks in advance. Willi ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing log with SSD on Sol10 u8
Interesting. Unfortunately, I can not zpool offline, nor zpool detach, nor zpool remove the existing c6t4d0s0 device. I thought perhaps we could boot something newer than b125 [*1] and I would be able to remove the slog device that is too big. The dev-127.iso does not boot [*2] due to splashimage, so I had to edit the ISO to remove that for booting. After booting with -B console=ttya, I find that it can not add the /dev/dsk entries for the 24 HDDs, since / is on a too-small ramdisk. Disk-full messages ensue. Yay! After I have finally imported the pools, without upgrading (since I have to boot back to Sol 10 u8 for production), I attempt to remove the slog that is no longer needed: # zpool remove zpool1 c6t4d0s0 cannot remove c6t4d0s0: pool must be upgrade to support log removal Sigh. Lund [*1] http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6574286 [*2] http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6739497 -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Replacing log with SSD on Sol10 u8
Hello list, I pre-created the pools we would use for when the SSD eventually come in. Not my finest moment perhaps. Since I knew the SSDs would be 32GB in size, I created 32GB slices on HDDs in slot 36 and 44. * For future reference to others thinking to do the same, do not bother setting up the log until you have the SSDs, or make the slices half that of the planned SSD size. So the SSDs arrived, and I have a spare X4540 to attempt the replacement, before we have to do it on all the production x4540s. Hopefully with no downtime. SunOS x4500-15.unix 5.10 Generic_141445-09 i86pc i386 i86pc logs c5t4d0s0 ONLINE 0 0 0 c6t4d0s0 ONLINE 0 0 0 # zpool detach zpool1 c5t4d0s0 # hdadm offline disk c5t4 This was very exciting, this is the first time EVER that the blue LED has turned on. much rejoicing! ;) Took slot 36 out, and inserted the first SSD. Lights came on green again, but just in case; # hdadm online disk c5t4 I used format to fdisk it, change to EFI label. # zpool attach zpool1 c6t4d0s0 c5t4d0 cannot attach c5t4d0 to c6t4d0s0; the device is too small Uhoh. Of course, I created a slice of 32gb, literally, and SSD 32GB is the old HDD human size. This has been fixed in OpenSolaris already (attaching smaller mirrors), but apparently not for Solaris 10 u8. I appear screwed. Are there patches to fix this perhaps? Hopefully? ;) However, would I COULD do is add a new device; # zpool add zpool1 log c5t4d0 # zpool status logs c6t4d0s0 ONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 Interesting. Unfortunately, I can not zpool offline, nor zpool detach, nor zpool remove the existing c6t4d0s0 device. At this point we are essentially stuck. I would have to re-create the whole pool to fix this. With servers live and full of customer data, this will be awkward. So I switched to a more .. direct approach. I also knew that if the log-device fails, it will go back to using the default log device. # hdadm offline disk c6t4 Even though this says OK, it does not actually work since the device is in use. In the end, I simply pulled out the HDD. Since we had already added a second log device, there were no hiccups at all. It barely noticed it was gone. logs c6t4d0s0 UNAVAIL 0 0 0 corrupted data c5t4d0 ONLINE 0 0 0 At this point we inserted the second SSD, did the format for EFI label, and we were a little surprised that this worked; # zpool attach zpool1 c5t4d0 c6t4d0 So now we have the situation of: logs c6t4d0s0 UNAVAIL 0 0 0 corrupted data mirrorONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 It would be nice to get rid of c6t4d0s0 though. Any thoughts? What would you experts do in this situation? We have to run Solaris 10 (lng battle there, no support for Opensolaris from anyone in Japan). Can I delete the sucker using zdb? Thanks for any reply, -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS directory and file quota
In that case instead of rewriting the part of my code which handles quota creation/updating/checking, I would need to completely rewrite the quota logic. :-( So what do you do just now with UFS ? Is it a separate filesystem for the mail directory ? If so it really shouldn't be that big of a deal to rewrite to run 'zfs set userquota@' instead of updating the UFS quota file. Certainly we changed the provisioning from using edquota to 'zfs set userquota' without issue. All the read-only code uses rquota / quotas-command to look up quota, and work exactly the same with ZFS userquotas, and did not need any changes. -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS dedup vs compression vs ZFS user/group quotas
We recently found that the ZFS user/group quota accounting for disk-usage worked opposite to what we were expecting. Ie, any space saved from compression was a benefit to the customer, not to us. (We expected the Google style: Give a customer 2GB quota, and if compression saves space, that is profit to us) Is the space saved with dedup charged in the same manner? I would expect so, I figured some of you would just know. I will check when b128 is out. I don't suppose I can change the model? :) Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS user quota, userused updates?
Are there way to force ZFS to update, or refresh it in some way when the user quota/used value is not true to what is the case? Are there known way to make it out of sync that we should avoid? SunOS x4500-11.unix 5.10 Generic_141445-09 i86pc i386 i86pc (Solaris 10 10/09 u8) zpool1/sd01_mail 223M 15.6T 222M /export/sd01/mail # zfs userspace zpool1/sd01_mail TYPENAMEUSED QUOTA POSIX User 1029 54.0M 100M # df -h . Filesystem size used avail capacity Mounted on zpool1/sd01_mail16T 222M16T 1%/export/sd01/mail # ls -lhn total 19600 -rw--- 1 1029 21001.7K Oct 20 12:03 1256007793.V4700025I1770M252506.vmx06.unix:2,S -rw--- 1 1029 21001.7K Oct 20 12:04 1256007873.V4700025I1772M63715.vmx06.unix:2,S -rw--- 1 1029 21001.6K Oct 20 12:05 1256007926.V4700025I1773M949133.vmx06.unix:2,S -rw--- 1 1029 2100 76M Oct 20 12:23 1256009005.V4700025I1791M762643.vmx06.unix:2,S -rw--- 1 1029 2100 54M Oct 20 12:36 1256009769.V4700034I179eM739748.vmx05.unix:2,S -rw--T 1 1029 21002.0M Oct 20 14:39 file The 54M file appears to to accounted for, but the 76M is not. I recently added a 2M by chown to see if it was a local-disk, vs NFS problem. The previous had not updated for 2 hours. # zfs get useru...@1029 zpool1/sd01_mail NAME PROPERTY VALUE SOURCE zpool1/sd01_mail useru...@1029 54.0M local Any suggestions would be most welcome, Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on s10u8
We were holding our breath for ZFS user-quotas, so we went u8, and upgraded the pool immediately. No issues here. But I had to install from CD, as LiveUpgrade failed. (bootadm -e no such argument) ZFS send appears faster in u8 too, as it was still slow in u7. Lund dick hoogendijk wrote: Any known issues for the new ZFS on solaris 10 update 8? Or is it still wiser to wait doing a zpool upgrade? Because older ABE's can no longer be accessed then. -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server
I too went with a 5in3 case for HDDs, in a nice portable Mini-ITX case, with Intel Atom. More of a SOHO NAS for home use, rather than a beast. Still, I can get about 10TB in it. http://lundman.net/wiki/index.php/ZFS_RAID I can also recommend the embeddedSolaris project for making a small bootable Solaris. Very flexible and can put on the Admin GUIs, and so on. https://sourceforge.net/projects/embeddedsolaris/ Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Solaris License with ZFS USER quotas?
Hello list, We are unfortunately still experiencing some issues regarding our support license with Sun, or rather our Sun Vendor. We need ZFS User quotas. (That's not the zfs file-system quota) which first appeared in svn_114. We would like to run something like svn_117 (don't really care which version per-se, that is just the one version we have done the most testing with). But our Vendor will only support Solaris 10. After weeks of wrangling, they have reluctantly agreed to let us run OpenSolaris 2009.06. (Which does not have ZFS User quotas). When I approach Sun-Japan directly I just get told that they don't speak English. When my Japanese colleagues approach Sun-Japan directly, it is suggested to us that we stay with our current Vendor. * Will there be official Solaris 10, or OpenSolaris releases with ZFS User quotas? (Will 2010.02 contain ZFS User quotas?) * Can we get support overseas perhaps, that will let us run a version of Solaris with ZFS User quotas? Support generally includes having the ability to replace hardware when it dies, and/or, send panic dumps if they happen for future patches. Internally, we are now discussing returning our 12x x4540, and calling NetApp. I would rather not (more work for me). I understand Sun is probably experiencing some internal turmoil at the moment, but it has been rather frustrating for us. Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solaris License with ZFS USER quotas?
Tomas Ögren wrote: http://sparcv9.blogspot.com/2009/08/solaris-10-update-8-1009-is-comming.html which is in no way official, says it'll be in 10u8 which should be coming within a month. /Tomas That would be perfect. I wonder why I have so much trouble finding information about future releases of Solaris. Thanks Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding SATA cards for ZFS; was Lundman home NAS
The mv8 is a marvell based chipset, and it appears there are no Solaris drivers for it. There doesn't appear to be any movement from Sun or marvell to provide any either. Do you mean specifically Marvell 6480 drivers? I use both DAC-SATA-MV8 and AOC-SAT2-MV8, which use Marvell MV88SX and works very well in Solaris. (Package SUNWmv88sx). Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4540 dead HDD replacement, remains configured.
Nope, that it does not. Ian Collins wrote: Jorgen Lundman wrote: Finally came to the reboot maintenance to reboot the x4540 to make it see the newly replaced HDD. I tried, reboot, then power-cycle, and reboot -- -r, but I can not make the x4540 accept any HDD in that bay. I'm starting to think that perhaps we did not lose the original HDD, but rather the slot, and there is a hardware problem. This is what I see after a reboot, the disk is c1t5d0, sd37, s...@5,0 or slot 13. c1::dsk/c1t4d0 disk connectedconfigured unknown c1::dsk/c1t5d0 disk connectedconfigured unknown c1::dsk/c1t6d0 disk connectedconfigured unknown Does format show it? -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4540 dead HDD replacement, remains configured.
30, 2311 Aug 20 02:23 s...@4,0:wd,raw drwxr-xr-x 2 root sys2 Aug 6 14:31 s...@5,0 drwxr-xr-x 2 root sys2 Apr 17 17:52 s...@6,0 brw-r- 1 root sys 30, 2432 Jul 6 09:50 s...@6,0:a crw-r- 1 root sys 30, 2432 Jul 6 09:48 s...@6,0:a,raw brw-r- 1 root sys 30, 2433 Jul 6 09:50 s...@6,0:b crw-r- 1 root sys 30, 2433 Jul 6 09:48 s...@6,0:b,raw brw-r- 1 root sys 30, 2434 Jul 6 09:50 s...@6,0:c [snip] brw-r- 1 root sys 30, 2452 Jul 6 09:50 s...@6,0:u crw-r- 1 root sys 30, 2452 Jul 6 09:48 s...@6,0:u,raw brw-r- 1 root sys 30, 2439 Aug 20 02:24 s...@6,0:wd crw-r- 1 root sys 30, 2439 Aug 20 02:23 s...@6,0:wd,raw drwxr-xr-x 2 root sys2 Apr 17 17:52 s...@7,0 brw-r- 1 root sys 30, 2496 Jul 2 15:30 s...@7,0:a crw-r- 1 root sys 30, 2496 Jul 6 09:48 s...@7,0:a,raw brw-r- 1 root sys 30, 2497 Jul 6 09:50 s...@7,0:b crw-r- 1 root sys 30, 2497 Jul 6 09:48 s...@7,0:b,raw brw-r- 1 root sys 30, 2498 Jul 6 09:50 s...@7,0:c crw-r- 1 root sys 30, 2498 Jul 6 09:43 s...@7,0:c,raw brw-r- 1 root sys 30, 2499 Jul 6 09:50 s...@7,0:d crw-r- 1 root sys 30, 2499 Jul 6 09:48 s...@7,0:d,raw brw-r- 1 root sys 30, 2500 Jul 6 09:50 s...@7,0:e crw-r- 1 root sys 30, 2500 Jul 6 09:48 s...@7,0:e,raw So it seems s...@5,0 is empty, it is peculiar that all other HDDs on c1tX works though. Eventually I notice that cfgadm goes to: c1::dsk/c1t4d0 disk connectedconfigured unknown c1::dsk/c1t5d0 disk connectedconfigured failed c1::dsk/c1t6d0 disk connectedconfigured unknown We promoted the Spare in use to replace c1t5d0, so now the pool looks like: pool: zpool1 state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM zpool1 ONLINE 0 0 0 raidz1ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 raidz1ONLINE 0 0 0 c5t3d0 ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 c2t7d0 ONLINE 0 0 0 c3t7d0 ONLINE 0 0 0 raidz1ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 raidz1ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c2t6d0 ONLINE 0 0 0 c3t6d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 c5t6d0 ONLINE 0 0 0 raidz1ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 raidz1ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c4t7d0 ONLINE 0 0 0 [was c1t5d0] c2t5d0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 raidz1ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c5t5d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 raidz1ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 raidz1ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 c5t7d0 ONLINE 0 0 0 -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing
Re: [zfs-discuss] Ssd for zil on a dell 2950
Does un-taring something count? It is what I used for our tests. I tested with ZIL disable, zil cache on /tmp/zil, CF-card (300x) and cheap SSD. Waiting for X-25E SSDs to arrive for testing those: http://mail.opensolaris.org/pipermail/zfs-discuss/2009-July/030183.html If you want a quick answer, disable ZIL (you need to unmount/mount, export/import or reboot) on your ZFS volume and try it. That is the theoretical maximum. You can get close to this using various technologies, SSD and all that. I am no expert on this, I knew nothing about it 2 weeks ago. But for our provisioning engine to untar Movable-Types for customers, 5 mins to 45secs is quite an improvement. I can get that to 11seconds theoretically. (ZIL disable) Lund Monish Shah wrote: Hello Greg, I'm curious how much performance benefit you gain from the ZIL accelerator. Have you measured that? If not, do you have a gut feel about how much it helped? Also, for what kind of applications does it help? (I know it helps with synchronous writes. I'm looking for real world answers like: Our XYZ application was running like a dog and we added an SSD for ZIL and the response time improved by X%.) Of course, I would welcome a reply from anyone who has experience with this, not just Greg. Monish - Original Message - From: Greg Mason gma...@msu.edu To: HUGE | David Stahl dst...@hugeinc.com Cc: zfs-discuss zfs-discuss@opensolaris.org Sent: Thursday, August 20, 2009 4:04 AM Subject: Re: [zfs-discuss] Ssd for zil on a dell 2950 Hi David, We are using them in our Sun X4540 filers. We are actually using 2 SSDs per pool, to improve throughput (since the logbias feature isn't in an official release of OpenSolaris yet). I kind of wish they made an 8G or 16G part, since the 32G capacity is kind of a waste. We had to go the NewEgg route though. We tried to buy some Sun-branded disks from Sun, but that's a different story. To summarize, we had to buy the NewEgg parts to ensure a project stayed on-schedule. Generally, we've been pretty pleased with them. Occasionally, we've had an SSD that wasn't behaving well. Looks like you can replace log devices now though... :) We use the 2.5 to 3.5 SATA adapter from IcyDock, in a Sun X4540 drive sled. If you can attach a standard sata disk to a Dell sled, this approach would most likely work for you as well. Only issue with using the third-party parts is that the involved support organizations for the software/hardware will make it very clear that such a configuration is quite unsupported. That said, we've had pretty good luck with them. -Greg -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] libzfs API: sharenfs, sharesmb, shareiscsi, $custom ?
It is llink, which was initially just HTTP streamer for Syabas/NetworkMediaTank, but I did just add UPnP, based on whtsup360 dumps. http://lundman.net/wiki/index.php/Llink You are welcome to try the latest development sources and test, but I believe the current state is that it shows up on 360, but does not browse. However, llink just shares content, no transcoding. At least not yet. Regarding the original question; It seems that it is easy to add own attributes, I will peruse the API documentation detailing how I would iterate file-systems looking for my attribute, since I would rather not system(zfs) hack it. Lund Ross wrote: Hi Jorgen, Does that software work to stream media to an xbox 360? If so could I have a play with it? It sounds ideal for my home server. cheers, Ross -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] libzfs API: sharenfs, sharesmb, shareiscsi, $custom ?
I cheated and simply use system() for the time being, but I will say that was rather nice and easy. Thank you Sun and OpenSolaris people. llink.conf: # If you use ZFS, you can auto-export any filesystem with a certain attribute set. # For example: zfs set net.lundman:sharellink=on zpool1/media ROOT|ZFS=net.lundman:sharellink|PATH=/usr/sbin/zfs @root.c debugf( : looking for ZFS filesystems\n); snprintf(buffer, sizeof(buffer), %s list -H -o mountpoint,%s, path, zfs); spawn = lion_system(buffer, 0, LION_FLAG_FULFILL, zfs); if (spawn) lion_set_handler(spawn, root_zfs_handler); # zfs set net.lundman:sharellink=on zpool1/media # ./llink -d -v 32 ./llink - Jorgen Lundman v2.2.1 lund...@shinken.interq.or.jp build 1451 (Tue Aug 18 14:02:44 2009) (libdvdnav). : looking for ZFS filesystems : [root] recognising 'xtrailers' : zfs command running : zfs adding '/zpool1/media' : [root] recognising '/zpool1/media' : zfs command finished. [main] ready! -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] libzfs API: sharenfs, sharesmb, shareiscsi, $custom ?
Hello list, As the developer of software that exports data/shares like that of NFS and Samba. (HTTP/UPnP export, written in C) I am curious if the libzfs API is flexible enough for me to create my own file-system attributes, similar to that of sharenfs and obtain this information in my software. Perhaps something in the lines of: zfs -o shareupnp=on zpool1/media And I will modify my streamer software to do the necessary calls to obtain the file-systems set to export. Or are there other suggestions to achieve similar results? I could mirror sharenfs but I was under the impression that the API is flexible. The ultimate goal is to move away from static paths listed in the config file. Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] x4540 dead HDD replacement, remains configured.
x4540 snv_117 We lost a HDD last night, and it seemed to take out most of the bus or something and forced us to reboot. (We have yet to experience losing a disk that didn't force a reboot mind you). So today, I'm looking at replacing the broken HDD, but no amount of work makes it turn on the blue LED. After trying that for an hour, we just replaced the HDD anyway. But no amount of work will make it use/recognise it. (We tried more than one working spare HDD too). For example: # zpool status raidz1 DEGRADED 0 0 0 c5t1d0ONLINE 0 0 0 c0t5d0ONLINE 0 0 0 spare DEGRADED 0 0 285K c1t5d0 UNAVAIL 0 0 0 cannot open c4t7d0 ONLINE 0 0 0 4.13G resilvered c2t5d0ONLINE 0 0 0 c3t5d0ONLINE 0 0 0 spares c4t7d0 INUSE currently in use # zpool offline zpool1 c1t5d0 raidz1 DEGRADED 0 0 0 c5t1d0ONLINE 0 0 0 c0t5d0ONLINE 0 0 0 spare DEGRADED 0 0 285K c1t5d0 OFFLINE 0 0 0 c4t7d0 ONLINE 0 0 0 4.13G resilvered c2t5d0ONLINE 0 0 0 c3t5d0ONLINE 0 0 0 # cfgadm -al Ap_Id Type Receptacle Occupant Condition c1 scsi-bus connectedconfigured unknown c1::dsk/c1t5d0 disk connectedconfigured failed # cfgadm -c unconfigure c1::dsk/c1t5d0 # cfgadm -al c1::dsk/c1t5d0 disk connectedconfigured failed # cfgadm -c unconfigure c1::dsk/c1t5d0 # cfgadm -c unconfigure c1::dsk/c1t5d0 # cfgadm -fc unconfigure c1::dsk/c1t5d0 # cfgadm -fc unconfigure c1::dsk/c1t5d0 # cfgadm -al c1::dsk/c1t5d0 disk connectedconfigured failed # hdadm offline slot 13 1:5:9: 13: 17: 21: 25: 29: 33: 37: 41: 45: c0t1 c0t5 c1t1 c1t5 c2t1 c2t5 c3t1 c3t5 c4t1 c4t5 c5t1 c5t5 ^b+ ^++ ^b+ ^-- ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ # cfgadm -al c1::dsk/c1t5d0 disk connectedconfigured failed # fmadm faulty FRU : HD_ID_47 (hc://:product-id=Sun-Fire-X4540:chassis-id=0915AMR048:server-id=x4500-10.unix:serial=9QMB024K:part=SEAGATE-ST35002NSSUN500G-09107B024K:revision=SU0D/chassis=0/bay=47/disk=0) faulty # fmadm repair HD_ID_47 fmadm: recorded repair to HD_ID_47 # format | grep c1t5d0 # # hdadm offline slot 13 1:5:9: 13: 17: 21: 25: 29: 33: 37: 41: 45: c0t1 c0t5 c1t1 c1t5 c2t1 c2t5 c3t1 c3t5 c4t1 c4t5 c5t1 c5t5 ^b+ ^++ ^b+ ^-- ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ # cfgadm -al c1::dsk/c1t5d0 disk connectedconfigured failed # ipmitool sunoem led get|grep 13 hdd13.fail.led | ON hdd13.ok2rm.led | OFF # zpool online zpool1 c1t5d0 warning: device 'c1t5d0' onlined, but remains in faulted state use 'zpool replace' to replace devices that are no longer present # cfgadm -c disconnect c1::dsk/c1t5d0 cfgadm: Hardware specific failure: operation not supported for SCSI device Bah, why were they changed to SCSI? Increasing the size of the hammer... # cfgadm -x replace_device c1::sd37 Replacing SCSI device: /devices/p...@0,0/pci10de,3...@b/pci1000,1...@0/s...@5,0 This operation will suspend activity on SCSI bus: c1 Continue (yes/no)? y SCSI bus quiesced successfully. It is now safe to proceed with hotplug operation. Enter y if operation is complete or n to abort (yes/no)? y # cfgadm -al c1::dsk/c1t5d0 disk connectedconfigured failed I am fairly certain that if I reboot, it will all come back ok again. But I would like to believe that I should be able to replace a disk without rebooting on a X4540. Any other commands I should try? Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Lundman home NAS
The case is made by Chyangfun, and the model made for Mini-ITX motherboards is called CGN-S40X. They had 6 pcs left last I talked to them, and need 3 week lead for more if I understand it correctly. I need to finish my LCD panel work before I will open shop to sell these. As for temperature, I have only check the server HDDs so far (on my wiki) but will test with green HDDs tonight. I do not know if Solaris can retrieve the Atom chipset temperature readings. The parts I used should be listed on my wiki. Anon wrote: I have the same case which I use as directed attached storage. I never thought about using it with a motherboard inside. Could you provide a complete parts list? What sort of temperatures at the chip, chipset, and drives did you find? Thanks! -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4540 dead HDD replacement, remains configured.
I suspect this is what it is all about: # devfsadm -v devfsadm[16283]: verbose: no devfs node or mismatched dev_t for /devices/p...@0,0/pci10de,3...@b/pci1000,1...@0/s...@5,0:a [snip] and indeed: brw-r- 1 root sys 30, 2311 Aug 6 15:34 s...@4,0:wd crw-r- 1 root sys 30, 2311 Aug 6 15:24 s...@4,0:wd,raw drwxr-xr-x 2 root sys2 Aug 6 14:31 s...@5,0 drwxr-xr-x 2 root sys2 Apr 17 17:52 s...@6,0 brw-r- 1 root sys 30, 2432 Jul 6 09:50 s...@6,0:a crw-r- 1 root sys 30, 2432 Jul 6 09:48 s...@6,0:a,raw Perhaps because it was booted with the dead disk in place, it never configured the entire sd5 mpt driver. Why the other hard-disks work I don't know. I suspect the only way to fix this, is to reboot again. Lund Jorgen Lundman wrote: x4540 snv_117 We lost a HDD last night, and it seemed to take out most of the bus or something and forced us to reboot. (We have yet to experience losing a disk that didn't force a reboot mind you). So today, I'm looking at replacing the broken HDD, but no amount of work makes it turn on the blue LED. After trying that for an hour, we just replaced the HDD anyway. But no amount of work will make it use/recognise it. (We tried more than one working spare HDD too). For example: # zpool status raidz1 DEGRADED 0 0 0 c5t1d0ONLINE 0 0 0 c0t5d0ONLINE 0 0 0 spare DEGRADED 0 0 285K c1t5d0 UNAVAIL 0 0 0 cannot open c4t7d0 ONLINE 0 0 0 4.13G resilvered c2t5d0ONLINE 0 0 0 c3t5d0ONLINE 0 0 0 spares c4t7d0 INUSE currently in use # zpool offline zpool1 c1t5d0 raidz1 DEGRADED 0 0 0 c5t1d0ONLINE 0 0 0 c0t5d0ONLINE 0 0 0 spare DEGRADED 0 0 285K c1t5d0 OFFLINE 0 0 0 c4t7d0 ONLINE 0 0 0 4.13G resilvered c2t5d0ONLINE 0 0 0 c3t5d0ONLINE 0 0 0 # cfgadm -al Ap_Id Type Receptacle Occupant Condition c1 scsi-bus connectedconfigured unknown c1::dsk/c1t5d0 disk connectedconfigured failed # cfgadm -c unconfigure c1::dsk/c1t5d0 # cfgadm -al c1::dsk/c1t5d0 disk connectedconfigured failed # cfgadm -c unconfigure c1::dsk/c1t5d0 # cfgadm -c unconfigure c1::dsk/c1t5d0 # cfgadm -fc unconfigure c1::dsk/c1t5d0 # cfgadm -fc unconfigure c1::dsk/c1t5d0 # cfgadm -al c1::dsk/c1t5d0 disk connectedconfigured failed # hdadm offline slot 13 1:5:9: 13: 17: 21: 25: 29: 33: 37: 41: 45: c0t1 c0t5 c1t1 c1t5 c2t1 c2t5 c3t1 c3t5 c4t1 c4t5 c5t1 c5t5 ^b+ ^++ ^b+ ^-- ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ # cfgadm -al c1::dsk/c1t5d0 disk connectedconfigured failed # fmadm faulty FRU : HD_ID_47 (hc://:product-id=Sun-Fire-X4540:chassis-id=0915AMR048:server-id=x4500-10.unix:serial=9QMB024K:part=SEAGATE-ST35002NSSUN500G-09107B024K:revision=SU0D/chassis=0/bay=47/disk=0) faulty # fmadm repair HD_ID_47 fmadm: recorded repair to HD_ID_47 # format | grep c1t5d0 # # hdadm offline slot 13 1:5:9: 13: 17: 21: 25: 29: 33: 37: 41: 45: c0t1 c0t5 c1t1 c1t5 c2t1 c2t5 c3t1 c3t5 c4t1 c4t5 c5t1 c5t5 ^b+ ^++ ^b+ ^-- ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ # cfgadm -al c1::dsk/c1t5d0 disk connectedconfigured failed # ipmitool sunoem led get|grep 13 hdd13.fail.led | ON hdd13.ok2rm.led | OFF # zpool online zpool1 c1t5d0 warning: device 'c1t5d0' onlined, but remains in faulted state use 'zpool replace' to replace devices that are no longer present # cfgadm -c disconnect c1::dsk/c1t5d0 cfgadm: Hardware specific failure: operation not supported for SCSI device Bah, why were they changed to SCSI? Increasing the size of the hammer... # cfgadm -x replace_device c1::sd37 Replacing SCSI device: /devices/p...@0,0/pci10de,3...@b/pci1000,1...@0/s...@5,0 This operation will suspend activity on SCSI bus: c1 Continue (yes/no)? y SCSI bus quiesced successfully. It is now safe to proceed with hotplug operation. Enter y if operation is complete or n to abort (yes/no)? y # cfgadm -al c1::dsk/c1t5d0 disk connectedconfigured failed I am fairly certain that if I reboot, it will all come back ok again. But I would like to believe that I should be able to replace a disk without rebooting on a X4540. Any other commands I should try? Lund -- Jorgen
Re: [zfs-discuss] x4540 dead HDD replacement, remains configured.
Well, to be fair, there were some special cases. I know we had 3 separate occasions with broken HDDs, when we were using UFS. 2 of these appeared to hang, and the 3rd only hung once we replaced the disk. This is most likely due to use using UFS in zvol (for quotas). We got an IDR patch, and eventually this was released as UFS 3-way deadlock writing log with zvol. I forget the number right now, but the patch is out. This is the very first time we have lost a disk in a purely-ZFS system, and I was somewhat hoping that this would be the time everything went smoothly. But it did not. However, I have also experienced (once) a disk dying in such a way that it took out the chain in a netapp, so perhaps the disk died like this here to (it is really dead). But still disappointing. Power cycling the x4540 takes about 7 minutes (service to service), but with Sol svn116(?) and up it can do quiesce-reboots, which take about 57 seconds. In this case, we had to power cycle. Ross wrote: Whoah! We have yet to experience losing a disk that didn't force a reboot Do you have any notes on how many times this has happened Jorgen, or what steps you've taken each time? I appreciate you're probably more concerned with getting an answer to your question, but if ZFS needs a reboot to cope with failures on even an x4540, that's an absolute deal breaker for everything we want to do with ZFS. Ross -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Lundman home NAS
100Mbit is quite flat at 11MB/s; http://lundman.net/wiki/index.php/Lraid5_iozone#Solaris_10_64-bit.2C_OsX_10.5.5_NFSv3.2C_100MBit.2C_ZIL_cache_disabled 1Gbit, MTU 1500; http://lundman.net/wiki/index.php/Lraid5_iozone#Solaris_10_64-bit.2C_OsX_10.5.5_NFSv3.2C_1GBit.2C_ZIL_cache_disabled Not sure how to enable jumbo frames on the rge0. When I use dladm set-linkprop -p mtu 9000 rge0 I get operation not supported. PERM is r--. Most likely I have to set it with rge.conf, and reboot, but I would need to rebuild my USB image for that. (unplumb, modunload, modload, plumb did not seem to enable it either). Jorgen Lundman wrote: Ok I have redone the initial tests as 4G instead. Graphs are on the same place. http://lundman.net/wiki/index.php/Lraid5_iozone I also mounted it with nfsv3 and mounted it for more iozone. Alas, I started with 100mbit, so it has taken quite a while. It is constantly at 11MB/s though. ;) Jorgen Lundman wrote: I was following Toms Hardware on how they test NAS units. I have 2GB memory, so I will re-run the test at 4, if I figure out which option that is. I used Excel for the graphs in this case, gnuplot did not want to work. (Nor did Excel mind you) Bob Friesenhahn wrote: On Sat, 1 Aug 2009, Louis-Frédéric Feuillette wrote: I find the results suspect. 1.2GB/s read, and 500MB/s write ! These are impressive numbers indeed. I then looked at the file sizes that iozone used... How much memory do you have? I seems like the files would be able to comfortably fit in memory. I think this test needs to be re-run with Large files (ie 2*Memory size ) for them to give more accurate data. The numbers are indeed suspect but the iozone sweep test is quite useful in order to see the influence of zfs's caching via the ARC. The sweep should definitely be run to at least 2X the memory size. Unrelated, what did you use to generate those graphs, they look good. Iozone output may be plotted via gnuplot or Microsoft Excel. This looks like the gnuplot output. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Lundman home NAS
Some preliminary speed tests, not too bad for a pci32 card. http://lundman.net/wiki/index.php/Lraid5_iozone Jorgen Lundman wrote: Finding a SATA card that would work with Solaris, and be hot-swap, and more than 4 ports, sure took a while. Oh and be reasonably priced ;) Double the price of the dual core Atom did not seem right. The SATA card was a close fit to the jumper were the power-switch cable attaches, as you can see in one of the photos. This is because the MV8 card is quite long, and has the big plastic SATA sockets. It does fit, but it was the tightest spot. I also picked the 5-in-3 drive cage that had the shortest depth listed, 190mm. For example the Supermicro M35T is 245mm, another 5cm. Not sure that would fit. Lund Nathan Fiedler wrote: Yes, please write more about this. The photos are terrific and I appreciate the many useful observations you've made. For my home NAS I chose the Chenbro ES34069 and the biggest problem was finding a SATA/PCI card that would work with OpenSolaris and fit in the case (technically impossible without a ribbon cable PCI adapter). After seeing this, I may reconsider my choice. For the SATA card, you mentioned that it was a close fit with the case power switch. Would removing the backplane on the card have helped? Thanks n On Fri, Jul 31, 2009 at 5:22 AM, Jorgen Lundmanlund...@gmo.jp wrote: I have assembled my home RAID finally, and I think it looks rather good. http://www.lundman.net/gallery/v/lraid5/p1150547.jpg.html Feedback is welcome. I have yet to do proper speed tests, I will do so in the coming week should people be interested. Even though I have tried to use only existing, and cheap, parts the end sum became higher than I expected. Final price is somewhere in the 47,000 yen range. (Without hard disks) If I were to make and sell these, they would be 57,000 or so, so I do not really know if anyone would be interested. Especially since SOHO NAS devices seem to start around 80,000. Anyway, sure has been fun. Lund ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Lundman home NAS
Ok I have redone the initial tests as 4G instead. Graphs are on the same place. http://lundman.net/wiki/index.php/Lraid5_iozone I also mounted it with nfsv3 and mounted it for more iozone. Alas, I started with 100mbit, so it has taken quite a while. It is constantly at 11MB/s though. ;) Jorgen Lundman wrote: I was following Toms Hardware on how they test NAS units. I have 2GB memory, so I will re-run the test at 4, if I figure out which option that is. I used Excel for the graphs in this case, gnuplot did not want to work. (Nor did Excel mind you) Bob Friesenhahn wrote: On Sat, 1 Aug 2009, Louis-Frédéric Feuillette wrote: I find the results suspect. 1.2GB/s read, and 500MB/s write ! These are impressive numbers indeed. I then looked at the file sizes that iozone used... How much memory do you have? I seems like the files would be able to comfortably fit in memory. I think this test needs to be re-run with Large files (ie 2*Memory size ) for them to give more accurate data. The numbers are indeed suspect but the iozone sweep test is quite useful in order to see the influence of zfs's caching via the ARC. The sweep should definitely be run to at least 2X the memory size. Unrelated, what did you use to generate those graphs, they look good. Iozone output may be plotted via gnuplot or Microsoft Excel. This looks like the gnuplot output. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Lundman home NAS
I have assembled my home RAID finally, and I think it looks rather good. http://www.lundman.net/gallery/v/lraid5/p1150547.jpg.html Feedback is welcome. I have yet to do proper speed tests, I will do so in the coming week should people be interested. Even though I have tried to use only existing, and cheap, parts the end sum became higher than I expected. Final price is somewhere in the 47,000 yen range. (Without hard disks) If I were to make and sell these, they would be 57,000 or so, so I do not really know if anyone would be interested. Especially since SOHO NAS devices seem to start around 80,000. Anyway, sure has been fun. Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Lundman home NAS
Finding a SATA card that would work with Solaris, and be hot-swap, and more than 4 ports, sure took a while. Oh and be reasonably priced ;) Double the price of the dual core Atom did not seem right. The SATA card was a close fit to the jumper were the power-switch cable attaches, as you can see in one of the photos. This is because the MV8 card is quite long, and has the big plastic SATA sockets. It does fit, but it was the tightest spot. I also picked the 5-in-3 drive cage that had the shortest depth listed, 190mm. For example the Supermicro M35T is 245mm, another 5cm. Not sure that would fit. Lund Nathan Fiedler wrote: Yes, please write more about this. The photos are terrific and I appreciate the many useful observations you've made. For my home NAS I chose the Chenbro ES34069 and the biggest problem was finding a SATA/PCI card that would work with OpenSolaris and fit in the case (technically impossible without a ribbon cable PCI adapter). After seeing this, I may reconsider my choice. For the SATA card, you mentioned that it was a close fit with the case power switch. Would removing the backplane on the card have helped? Thanks n On Fri, Jul 31, 2009 at 5:22 AM, Jorgen Lundmanlund...@gmo.jp wrote: I have assembled my home RAID finally, and I think it looks rather good. http://www.lundman.net/gallery/v/lraid5/p1150547.jpg.html Feedback is welcome. I have yet to do proper speed tests, I will do so in the coming week should people be interested. Even though I have tried to use only existing, and cheap, parts the end sum became higher than I expected. Final price is somewhere in the 47,000 yen range. (Without hard disks) If I were to make and sell these, they would be 57,000 or so, so I do not really know if anyone would be interested. Especially since SOHO NAS devices seem to start around 80,000. Anyway, sure has been fun. Lund ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
Bob Friesenhahn wrote: Something to be aware of is that not all SSDs are the same. In fact, some faster SSDs may use a RAM write cache (they all do) and then ignore a cache sync request while not including hardware/firmware support to ensure that the data is persisted if there is power loss. Perhaps your fast CF device does that. If so, that would be really bad for zfs if your server was to spontaneously reboot or lose power. This is why you really want a true enterprise-capable SSD device for your slog. Naturally, we just wanted to try the various technologies to see how they compared. Store-bought CF card took 26s, store-bought SSD 48s. We have not found a PCI NVRam card yet. When talking to our Sun vendor, they have no solutions, which is annoying. X25-E would be good, but some pools have no spares, and since you can't remove vdevs, we'd have to move all customers off the x4500 before we can use it. CF card need reboot to see the cards, but 6 servers are x4500, not x4540, so not really a global solution. PCI NVRam cards need a reboot, but should work in both x4500 and x4540 without zpool rebuilding. But can't actually find any with Solaris drivers. Peculiar. Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
X25-E would be good, but some pools have no spares, and since you can't remove vdevs, we'd have to move all customers off the x4500 before we can use it. Ah it just occurred to me that perhaps for our specific problem, we will buy two X25-Es and replace the root mirror. The OS and ZIL logs can live together and put /var in the data pool. That way we would not need to rebuild the data-pool and all the work that comes with that. Shame I can't zpool replace to a smaller disk (500GB HDD to 32GB SSD) though, I will have to lucreate and reboot one time. Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
We just picked up the fastest SSD we could in the local biccamera, which turned out to be a CSSDーSM32NI, with supposedly 95MB/s write speed. I put it in place, and replaced the slog over: 0m49.173s 0m48.809s So, it is slower than the CF test. This is disappointing. Everyone else seems to use Intel X25-M, which have a write-speed of 170MB/s (2nd generation) so perhaps that is why it works better for them. It is curious that it is slower than the CF card. Perhaps because it shares with so many other SATA devices? Oh and we'll probably have to get a 3.5 frame for it, as I doubt it'll stay standing after the next earthquake. :) Lund Jorgen Lundman wrote: This thread started over in nfs-discuss, as it appeared to be an nfs problem initially. Or at the very least, interaction between nfs and zil. Just summarising speeds we have found when untarring something. Always in a new/empty directory. Only looking at write speed. read is always very fast. The reason we started to look at this was because the 7 year old netapp being phased out, could untar the test file in 11 seconds. The x4500/x4540 Suns took 5 minutes. For all our tests, we used MTOS-4.261-ja.tar.gz, just a random tarball I had lying around, but it can be downloaded here if you want the same test. (http://www.movabletype.org/downloads/stable/MTOS-4.261-ja.tar.gz) The command executed generally, is: # mkdir .test34 time gtar --directory=.test34 -zxf /tmp/MTOS-4.261-ja.tar.gz Solaris 10 1/06 intel client: netapp 6.5.1 FAS960 server: NFSv3 0m11.114s Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 server: nfsv4 5m11.654s Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv3 8m55.911s Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv4 10m32.629s Just untarring the tarball on the x4500 itself: : x4500 OpenSolaris svn117 server 0m0.478s : x4500 Solaris 10 10/08 server 0m1.361s So ZFS itself is very fast. Replacing NFS with different protocols, identical setup, just changing tar with rsync, and nfsd with sshd. The baseline test, using: rsync -are ssh /tmp/MTOS-4.261-ja /export/x4500/testXX Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync on nfsv4 3m44.857s Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync+ssh 0m1.387s So, get rid of nfsd and it goes from 3 minutes to 1 second! Lets share it with smb, and mount it: OsX 10.5.6 intel client: x4500 OpenSolaris svn117 : smb+untar 0m24.480s Neat, even SMB can beat nfs in default settings. This would then indicate to me that nfsd is broken somehow, but then we try again after only disabling ZIL. Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE ZIL: nfsv4 0m8.453s 0m8.284s 0m8.264s Nice, so this is theoretically the fastest NFS speeds we can reach? We run postfix+dovecot for mail, which probably would be safe to not use ZIL. The other type is FTP/WWW/CGI, which has more active writes/updates. Probably not as good. Comments? Enable ZIL, but disable zfscache (Just as a test, I have been told disabling zfscache is far more dangerous). Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE zfscacheflush: nfsv4 0m45.139s Interesting. Anyway, enable ZIL and zfscacheflush again, and learn a whole lot about slog. First I tried creating a 2G slog on the boot mirror: Solaris 10 6/06 : x4500 OpenSolaris svn117 slog boot pool: nfsv4 1m59.970s Some improvements. For a lark, I created a 2GB file in /tmp/ and changed the slog to that. (I know, having the slog in volatile RAM is pretty much the same as disabling ZIL. But it should give me theoretical maximum speed with ZIL enabled right?). Solaris 10 6/06 : x4500 OpenSolaris svn117 slog /tmp/junk: nfsv4 0m8.916s Nice! Same speed as ZIL disabled. Since this is a X4540, we thought we would test with a CF card attached. Alas the 600X (92MB/s) card are not out until next month, rats! So, we bought a 300X (40MB/s) card. Solaris 10 6/06 : x4500 OpenSolaris svn117 slog 300X CFFlash: nfsv4 0m26.566s Not too bad really. But you have to reboot to see a CF card, fiddle with BIOS for the boot order etc. Just not an easy add on a live system. A SATA emulated SSD DISK can be hot-swapped. Also, I learned an interesting lesson about rebooting with slog at /tmp/junk. I am hoping to pick up a SSD SATA device today and see what speeds we get out of that. That rsync (1s) vs nfs(8s) I can accept as over-head on a much more complicated protocol, but why would it take 3 minutes to write the same data on the same pool, with rsync(1s) vs nfs(3m)? The ZIL was on, slog is default, but both writing the same way. Does nfsd add FD_SYNC to every close regardless as to whether the application did or not? This I have not yet wrapped my head around. For example, I know rsync
Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08
This thread started over in nfs-discuss, as it appeared to be an nfs problem initially. Or at the very least, interaction between nfs and zil. Just summarising speeds we have found when untarring something. Always in a new/empty directory. Only looking at write speed. read is always very fast. The reason we started to look at this was because the 7 year old netapp being phased out, could untar the test file in 11 seconds. The x4500/x4540 Suns took 5 minutes. For all our tests, we used MTOS-4.261-ja.tar.gz, just a random tarball I had lying around, but it can be downloaded here if you want the same test. (http://www.movabletype.org/downloads/stable/MTOS-4.261-ja.tar.gz) The command executed generally, is: # mkdir .test34 time gtar --directory=.test34 -zxf /tmp/MTOS-4.261-ja.tar.gz Solaris 10 1/06 intel client: netapp 6.5.1 FAS960 server: NFSv3 0m11.114s Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 server: nfsv4 5m11.654s Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv3 8m55.911s Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv4 10m32.629s Just untarring the tarball on the x4500 itself: : x4500 OpenSolaris svn117 server 0m0.478s : x4500 Solaris 10 10/08 server 0m1.361s So ZFS itself is very fast. Replacing NFS with different protocols, identical setup, just changing tar with rsync, and nfsd with sshd. The baseline test, using: rsync -are ssh /tmp/MTOS-4.261-ja /export/x4500/testXX Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync on nfsv4 3m44.857s Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync+ssh 0m1.387s So, get rid of nfsd and it goes from 3 minutes to 1 second! Lets share it with smb, and mount it: OsX 10.5.6 intel client: x4500 OpenSolaris svn117 : smb+untar 0m24.480s Neat, even SMB can beat nfs in default settings. This would then indicate to me that nfsd is broken somehow, but then we try again after only disabling ZIL. Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE ZIL: nfsv4 0m8.453s 0m8.284s 0m8.264s Nice, so this is theoretically the fastest NFS speeds we can reach? We run postfix+dovecot for mail, which probably would be safe to not use ZIL. The other type is FTP/WWW/CGI, which has more active writes/updates. Probably not as good. Comments? Enable ZIL, but disable zfscache (Just as a test, I have been told disabling zfscache is far more dangerous). Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE zfscacheflush: nfsv4 0m45.139s Interesting. Anyway, enable ZIL and zfscacheflush again, and learn a whole lot about slog. First I tried creating a 2G slog on the boot mirror: Solaris 10 6/06 : x4500 OpenSolaris svn117 slog boot pool: nfsv4 1m59.970s Some improvements. For a lark, I created a 2GB file in /tmp/ and changed the slog to that. (I know, having the slog in volatile RAM is pretty much the same as disabling ZIL. But it should give me theoretical maximum speed with ZIL enabled right?). Solaris 10 6/06 : x4500 OpenSolaris svn117 slog /tmp/junk: nfsv4 0m8.916s Nice! Same speed as ZIL disabled. Since this is a X4540, we thought we would test with a CF card attached. Alas the 600X (92MB/s) card are not out until next month, rats! So, we bought a 300X (40MB/s) card. Solaris 10 6/06 : x4500 OpenSolaris svn117 slog 300X CFFlash: nfsv4 0m26.566s Not too bad really. But you have to reboot to see a CF card, fiddle with BIOS for the boot order etc. Just not an easy add on a live system. A SATA emulated SSD DISK can be hot-swapped. Also, I learned an interesting lesson about rebooting with slog at /tmp/junk. I am hoping to pick up a SSD SATA device today and see what speeds we get out of that. That rsync (1s) vs nfs(8s) I can accept as over-head on a much more complicated protocol, but why would it take 3 minutes to write the same data on the same pool, with rsync(1s) vs nfs(3m)? The ZIL was on, slog is default, but both writing the same way. Does nfsd add FD_SYNC to every close regardless as to whether the application did or not? This I have not yet wrapped my head around. For example, I know rsync and tar does not use fdsync (but dovecot does) on its close(), but does NFS make it fdsync anyway? Sorry for the giant email. -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Mirror cloning
Darren J Moffat wrote: Maybe the 2 disk mirror is a special enough case that this could be worth allowing without having to deal with all the other cases as well. The only reason I think it is a special enough cases is because it is the config we use for the root/boot pool. See 6849185 and 5097228. Ah of course, you have a valid point and mirrors can be used it much more complicated situations. Been reading your blog all day, while impatiently waiting for zfs-crypto.. Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Mirror cloning
That is because you had only one other choice: filesystem level copy. With ZFS I believe you will find that snapshots will allow you to have better control over this. The send/receive process is very, very similar to a mirror resilver, so you are only carrying your previous process forward into a brave new world. You'll find that send/receive is much more flexible than broken mirrors can be. -- richard Perhaps, but when the crunch is on, it is hard to beat the 3 minute cloning. zfs send will not be done in 3 minutes, especially if the version used is before zfs send speed fixes, like official Sol 10 10/08. (I am not sure, but zfs send sounds like you already need the 2nd server set up and running with IPs etc? ) Anyway, we have found a procedure now, so it is all possible. But it would have been nicer to be able to detach the disk politely ;) Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Mirror cloning
Ok, so it seems that with DiskSuite, detaching a mirror does nothing to the disk you detached. However, zpool detach appears to mark the disk as blank, so nothing will find any pools (import, import -D etc). zdb -l will show labels, but no amount of work that we have found will bring the HDD back online in the new server. Grub is blank, and findroot can not see any pool. zpool will not let you offline the 2nd disk in a mirror. This is incorrect behaviour. You can not cfgadm unconfigure the sata device while zpool has the disk. We can just yank the disk, but we had issues getting a new-blank disk recognised after that. cfgadm would not release the old disk. However, we found we can do this: # cfgadm -x sata_port_deactivate sata0/1::dsk/c0t1d0 This will make zpool mark it: c0t1d0s0 REMOVED 0 0 0 and eventually: c0t1d0s0 FAULTED 0 0 0 too many errors After that, we pull out the disk, and issue: # zpool detach zboot c0t1d0s0 # cfgadm -x sata_port_activate sata0/1::dsk/c0t1d0 # cfgadm -c configure sata0/1::dsk/c0t1d0 # format (fdisk, partition as required to be the same) # zpool attach zboot c0t0d0s0 c0t1d0s0 There is one final thing to address, when the disk is used in a new machine, it will generally panic with pool was used previously with system-id xx. Which requires more miniroot work. It would be nice to be able to avoid this as well. But you can't export the / pool before pulling out the disk, either. Jorgen Lundman wrote: Hello list, Before we started changing to ZFS bootfs, we used DiskSuite mirrored ufs boot. Very often, if we needed to grow a cluster by another machine or two, we would simply clone a running live server. Generally the procedure for this would be; 1 detach the 2nd HDD, metaclear, and delete metadb on 2nd disk. 2 mount the 2nd HDD under /mnt, and change system/vfstab to be a single boot HDD, and no longer mirrored, as well as host name, and IP addresses. 3 bootadm update-archive -R /mnt 4 unmount, cfgadm unconfigure, and pull out the HDD. and generally, in about ~4 minutes, we have a new live server in the cluster. We tried to do the same thing to day, but with a ZFS bootfs. We did: 1 zpool detach on the 2nd HDD. 2 cfgadm unconfigure the HDD, and pull out the disk. The source server was fine, could insert new disk, attach it, and it resilvered. However, the new destination server had lots of issues. At first, grub would give no menu at all, just the grub? command prompt. The command: findroot(pool_zboot,0,a) would return Error 15: No such file. After booting a Solaris Live CD, I could zpool import the pool, but of course it was in Degraded mode etc. Now it would show menu, but if you boot it, it would flash the message that the pool was last accessed by Solaris $sysid, and panic. After a lot of reboots, and fiddling, I managed to get miniroot to at least boot, then, only after inserting a new HDD and letting the pool become completely good would it let me boot into multi-user. Is there something we should do perhaps, that will let the cloning procedure go smoothly? Should I export the 'now separated disk' somehow? In fact, can I mount that disk to make changes to it before pulling out the disk? Most documentation on cloning uses zfs send, which would be possible, but 4 minutes is hard to beat when your cluster is under heavy load. Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Mirror cloning
Jorgen Lundman wrote: However, zpool detach appears to mark the disk as blank, so nothing will find any pools (import, import -D etc). zdb -l will show labels, For kicks, I tried to demonstrate this does indeed happen, so I dd'ed the first 1024 1k blocks from the disk, zpool detach it, then dd'ed the image back out to the HDD. Pulled out disk and it boots directly without any interventions. If only zpool detach had a flag to tell it not to scribble over the detached disk. Guess I could diff the before and after disk image and work out what it is that it does, and write a tool to undo it, or figure out if I can undo it using zdb. Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solaris live CD that supports ZFS root mount for fs fixes
We used the OpenSolaris preview 2010.02 DVD on genunix.org, to fix our broken zboot after attempting to clone. It had the zpool and zfs tools enough to import, re-mount etc. Lund Matt Weatherford wrote: Hi, I borked a libc.so library file on my solaris 10 server (zfs root) - was wondering if there is a good live CD that will be able to mount my ZFS root fs so that I can make this quick repair on the system boot drive and get back running again. Are all ZFS roots created equal? Its an x86 solaris 10 box. If I boot a belenix live CD will it be able to mount this ZFS root? Thanks, Matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
I have no idea. I downloaded the script from Bob without modifications and ran it specifying only the name of our pool. Should I have changed something to run the test? We have two kinds of x4500/x4540, those with Sol 10 10/08, and 2 running svn117 for ZFS quotas. Worth trying on both? Lund Ross wrote: Jorgen, Am I right in thinking the numbers here don't quite work. 48M blocks is just 9,000 files isn't it, not 93,000? I'm asking because I had to repeat a test earlier - I edited the script with vi, but when I ran it, it was still using the old parameters. I ignored it as a one off, but I'm wondering if your test has done a similar thing. Ross x4540 running svn117 # ./zfs-cache-test.ksh zpool1 zfs create zpool1/zfscachetest creating data file set 93000 files of 8192000 bytes0 under /zpool1/zfscachetest ... done1 zfs unmount zpool1/zfscachetest zfs mount zpool1/zfscachetest doing initial (unmount/mount) 'cpio -o . /dev/null' 48000247 blocks real4m7.13s user0m9.27s sys 0m49.09s doing second 'cpio -o . /dev/null' 48000247 blocks real4m52.52s user0m9.13s sys 0m47.51s ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discu ss -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
Ah yes, my apologies! I haven't quite worked out why OsX VNC server can't handle keyboard mappings. I have to copy'paste @ even. As I pasted the output into my mail over VNC, it would have destroyed the (not very) unusual characters. Ross wrote: Aaah, nevermind, it looks like there's just a rogue 9 appeared in your output. It was just a standard run of 3,000 files. -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
I also ran this on my future RAID/NAS. Intel Atom 330 (D945GCLF2) dual core 1.6ghz, on a single HDD pool. svn_114, 64 bit, 2GB RAM. bash-3.23 ./zfs-cache-test.ksh zboot zfs create zboot/zfscachetest creating data file set (3000 files of 8192000 bytes) under /zboot/zfscachetest ... done1 zfs unmount zboot/zfscachetest zfs mount zboot/zfscachetest doing initial (unmount/mount) 'cpio -c 131072 -o . /dev/null' 48000256 blocks real7m45.96s user0m6.55s sys 1m20.85s doing second 'cpio -c 131072 -o . /dev/null' 48000256 blocks real7m50.35s user0m6.76s sys 1m32.91s feel free to clean up with 'zfs destroy zboot/zfscachetest'. Bob Friesenhahn wrote: There has been no forward progress on the ZFS read performance issue for a week now. A 4X reduction in file read performance due to having read the file before is terrible, and of course the situation is considerably worse if the file was previously mmapped as well. Many of us have sent a lot of money to Sun and were not aware that ZFS is sucking the life out of our expensive Sun hardware. It is trivially easy to reproduce this problem on multiple machines. For example, I reproduced it on my Blade 2500 (SPARC) which uses a simple mirrored rpool. On that system there is a 1.8X read slowdown from the file being accessed previously. In order to raise visibility of this issue, I invite others to see if they can reproduce it in their ZFS pools. The script at http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-cache-test.ksh Implements a simple test. It requires a fair amount of disk space to run, but the main requirement is that the disk space consumed be more than available memory so that file data gets purged from the ARC. The script needs to run as root since it creates a filesystem and uses mount/umount. The script does not destroy any data. There are several adjustments which may be made at the front of the script. The pool 'rpool' is used by default, but the name of the pool to test may be supplied via an argument similar to: # ./zfs-cache-test.ksh Sun_2540 zfs create Sun_2540/zfscachetest Creating data file set (3000 files of 8192000 bytes) under /Sun_2540/zfscachetest ... Done! zfs unmount Sun_2540/zfscachetest zfs mount Sun_2540/zfscachetest Doing initial (unmount/mount) 'cpio -o /dev/null' 48000247 blocks real2m54.17s user0m7.65s sys 0m36.59s Doing second 'cpio -o /dev/null' 48000247 blocks real11m54.65s user0m7.70s sys 0m35.06s Feel free to clean up with 'zfs destroy Sun_2540/zfscachetest'. And here is a similar run on my Blade 2500 using the default rpool: # ./zfs-cache-test.ksh zfs create rpool/zfscachetest Creating data file set (3000 files of 8192000 bytes) under /rpool/zfscachetest ... Done! zfs unmount rpool/zfscachetest zfs mount rpool/zfscachetest Doing initial (unmount/mount) 'cpio -o /dev/null' 48000247 blocks real13m3.91s user2m43.04s sys 9m28.73s Doing second 'cpio -o /dev/null' 48000247 blocks real23m50.27s user2m41.81s sys 9m46.76s Feel free to clean up with 'zfs destroy rpool/zfscachetest'. I am interested to hear about systems which do not suffer from this bug. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
c2t2d0 ONLINE 0 0 0 raidz1ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 raidz1ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 raidz1ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c5t3d0 ONLINE 0 0 0 c6t3d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 raidz1ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c2t6d0 ONLINE 0 0 0 raidz1ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c5t5d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 c2t7d0 ONLINE 0 0 0 spares c3t6d0AVAIL c4t6d0AVAIL c5t6d0AVAIL c6t6d0AVAIL errors: No known data errors zfs create zpool1/zfscachetest Creating data file set (3000 files of 8192000 bytes) under /zpool1/zfscachetest ... Done! zfs unmount zpool1/zfscachetest zfs mount zpool1/zfscachetest Doing initial (unmount/mount) 'cpio -C 131072 -o /dev/null' 48000256 blocks real3m5.51s user0m1.70s sys 0m29.53s Doing second 'cpio -C 131072 -o /dev/null' 48000256 blocks real4m7.63s user0m1.67s sys 0m26.66s Feel free to clean up with 'zfs destroy zpool1/zfscachetest'. Intel Atom: bash-3.2# ./zfs-cache-test.ksh zboot System Configuration: System architecture: i386 System release level: 5.11 snv_114 CPU ISA list: amd64 pentium_pro+mmx pentium_pro pentium+mmx pentium i486 i386 i86 Pool configuration: pool: zboot state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM zboot ONLINE 0 0 0 c1d0s0ONLINE 0 0 0 errors: No known data errors zfs create zboot/zfscachetest Creating data file set (3000 files of 8192000 bytes) under /zboot/zfscachetest ... Done! zfs unmount zboot/zfscachetest zfs mount zboot/zfscachetest Doing initial (unmount/mount) 'cpio -C 131072 -o /dev/null' 48000256 blocks real7m27.87s user0m6.51s sys 1m20.28s Doing second 'cpio -C 131072 -o /dev/null' 48000256 blocks real7m25.34s user0m6.63s sys 1m32.04s Feel free to clean up with 'zfs destroy zboot/zfscachetest'. -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
You have some mighty pools there. Something I find quite interesting is that those who have mighty pools generally obtain about the same data rate regardless of their relative degree of excessive might. This causes me to believe that the Solaris kernel is throttling the read rate so that throwing more and faster hardware at the problem does not help. Are you saying the X4500s we have are set up incorrectly, or done in a way which will make them run poorly? The servers came with no documentation nor advise. I have yet to find a good place that suggest configurations for dedicated x4500 NFS servers. We had to find out about the NFSD_SERVERS when the first trouble came in. (Followed by 5 other tweaks and limits-reached troubles). If Sun really wants to compete with NetApp, you'd think they would ship us hardware configured for NFS servers, not x4500s configured for desktops :( They are cheap though! Nothing like being Wall-Mart of Storage! That is how the pools were created as well. Admittedly it may be down to our Vendor again. Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Mirror cloning
Hello list, Before we started changing to ZFS bootfs, we used DiskSuite mirrored ufs boot. Very often, if we needed to grow a cluster by another machine or two, we would simply clone a running live server. Generally the procedure for this would be; 1 detach the 2nd HDD, metaclear, and delete metadb on 2nd disk. 2 mount the 2nd HDD under /mnt, and change system/vfstab to be a single boot HDD, and no longer mirrored, as well as host name, and IP addresses. 3 bootadm update-archive -R /mnt 4 unmount, cfgadm unconfigure, and pull out the HDD. and generally, in about ~4 minutes, we have a new live server in the cluster. We tried to do the same thing to day, but with a ZFS bootfs. We did: 1 zpool detach on the 2nd HDD. 2 cfgadm unconfigure the HDD, and pull out the disk. The source server was fine, could insert new disk, attach it, and it resilvered. However, the new destination server had lots of issues. At first, grub would give no menu at all, just the grub? command prompt. The command: findroot(pool_zboot,0,a) would return Error 15: No such file. After booting a Solaris Live CD, I could zpool import the pool, but of course it was in Degraded mode etc. Now it would show menu, but if you boot it, it would flash the message that the pool was last accessed by Solaris $sysid, and panic. After a lot of reboots, and fiddling, I managed to get miniroot to at least boot, then, only after inserting a new HDD and letting the pool become completely good would it let me boot into multi-user. Is there something we should do perhaps, that will let the cloning procedure go smoothly? Should I export the 'now separated disk' somehow? In fact, can I mount that disk to make changes to it before pulling out the disk? Most documentation on cloning uses zfs send, which would be possible, but 4 minutes is hard to beat when your cluster is under heavy load. Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
x4540 running svn117 # ./zfs-cache-test.ksh zpool1 zfs create zpool1/zfscachetest creating data file set 93000 files of 8192000 bytes0 under /zpool1/zfscachetest ... done1 zfs unmount zpool1/zfscachetest zfs mount zpool1/zfscachetest doing initial (unmount/mount) 'cpio -o . /dev/null' 48000247 blocks real4m7.13s user0m9.27s sys 0m49.09s doing second 'cpio -o . /dev/null' 48000247 blocks real4m52.52s user0m9.13s sys 0m47.51s ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] how to discover disks?
If you want to use the entire disk in a zpool, you should use the notation without the c? trailing part. Ie, c2d0. (SATA related disks do not have the t? target part, whereas SCSI, and SCSI-emulated devices do. Like CDROMs, USB etc). If you are using just a part of a disk, one partition/slice, you will use the s? notation. For example, c2d0s6. There is one caveat, x86 bootable HDDs need to be SMI partitioned, EFI partitions will not work. So for bootable root volumes, it has to be a partition. Run format on the disk, and create your partition the way you want it. Probably just s0 spanning the entire disk. (Not counting the virtual s8 boot partition, and of course the entire-disk partition s2). Then write it as a SMI label, then you can attach it to your root pool. It usually reminds you to run installgrub on the disk too. I am not an expert on this, this is just what I have found out so far. Lund Hua-Ying Ling wrote: When I use cfgadm -a it only seems to list usb devices? #cfgadm -a Ap_Id Type Receptacle Occupant Condition usb2/1 unknown emptyunconfigured ok usb2/2 unknown emptyunconfigured ok usb2/3 unknown emptyunconfigured ok usb3/1 unknown emptyunconfigured ok I'm trying to convert a nonredundant storage pool to a mirrored pool. I'm following the zfs admin guide on page 71. I currently have a existing rpool: #zpool status pool: rpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c3d0s0ONLINE 0 0 0 I want to mirror this drive, I tried using format to get the disk name #format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c3d0 DEFAULT cyl 24318 alt 2 hd 255 sec 63 /p...@0,0/pci-...@14,1/i...@0/c...@0,0 1. c3d1 drive type unknown /p...@0,0/pci-...@14,1/i...@0/c...@1,0 So I tried #zpool attach rpool c3d0s0 c3d1s0 // failed cannot open '/dev/dsk/c3d1s0': No such device or address #zpool attach rpool c3d0s0 c3d1 // failed cannot label 'c3d1': EFI labeled devices are not supported on root pools. Thoughts? Thanks, Hua-Ying On Mon, Jul 6, 2009 at 2:37 AM, Carsten Aulbertcarsten.aulb...@aei.mpg.de wrote: Hi Hua-Ying Ling wrote: How do I discover the disk name to use for zfs commands such as: c3d0s0? I tried using format command but it only gave me the first 4 letters: c3d1. Also why do some command accept only 4 letter disk names and others require 6 letters? Usually i find cfgadm -a helpful enough for that (mayby adding '|grep disk' to it). Why sometimes 4 and sometimes 6 characters: c3d1 - this would be disk#1 on controller#3 c3d0s0 - this would be slice #0 (partition) on disk #0 on controller #3 Usually there is a also t0 there, e.g.: cfgadm -a|grep disk |head sata0/0::dsk/c0t0d0disk connectedconfigured ok sata0/1::dsk/c0t1d0disk connectedconfigured ok sata0/2::dsk/c0t2d0disk connectedconfigured ok sata0/3::dsk/c0t3d0disk connectedconfigured ok sata0/4::dsk/c0t4d0disk connectedconfigured ok sata0/5::dsk/c0t5d0disk connectedconfigured ok sata0/6::dsk/c0t6d0disk connectedconfigured ok sata0/7::dsk/c0t7d0disk connectedconfigured ok sata1/0::dsk/c1t0d0disk connectedconfigured ok sata1/1::dsk/c1t1d0disk connectedconfigured ok HTH Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interposing on readdir and friends
I am no expert, but I recently wrote a wrapper for my Media players, that expands .RAR archives, and presents files inside as regular contents of the directory. It may give you a starting point; wikihttp://lundman.net/wiki/index.php/Librarchy tarball http://www.lundman.net/ftp/librarcy/librarcy-1.0.3.tar.gz CVSweb http://www.lundman.net/cvs/viewvc.cgi/lundman/librarcy/ Lund Peter Tribble wrote: We've just stumbled across an interesting problem in one of our applications that fails when run on a ZFS filesystem. I don't have the code, so I can't fix it at source, but it's relying on the fact that if you do readdir() on a directory, the files come back in the order they were added to the directory. This appears to be true (within certain limitations) on UFS, but certainly isn't true on ZFS. Is there any way to force readdir() to return files in a specific order? (On UFS, we have a scipt that creates symlinks in the correct order. Ugly, but seems to have worked for many years.) If not, I was looking at interposing my own readdir() (that's assuming the application is using readdir()) that actually returns the entries in the desired order. However, I'm having a bit of trouble hacking this together (the current source doesn't compile in isolation on my S10 machine). -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Open Solaris version recommendation? b114, b117?
We have been told we can have support for OpenSolaris finally, so we can move the ufs on zvol over to zfs with user-quotas. Does anyone have any feel for the versions of Solaris that has zfs user quotas? We will put it on the x4540 for customers. I have run b114 for about 5 weeks, and have yet to experience any problems. But b117 is what 2010/02 version will be based on, so perhaps that is a better choice. Other versions worth considering? I know it's a bit vague, but perhaps there is a known panic in a certain version that I may not be aware. Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best controller card for 8 SATA drives ?
I only have a 32bit PCI bus in the Intel Atom 330 board, so I have no choice than to be slower, but I can confirm that the Supermicro dac-sata-mv8 (SATA-1) card works just fine, and does display in cfgadm. (Hot-swapping is possible). I have been told aoc-sat2-mv8 does as well (SATA-II) but I have not personally tried it. Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] PicoLCD Was: Best controller card for 8 SATA drives ?
I hesitate to post this question here, since the relation to ZFS is tenuous at best (zfs to sata controller to LCD panel). But maybe someone has already been down this path before me. Looking at building a RAID, with osol and zfs, I naturally want a front-panel. I was looking at something like; http://www.mini-box.com/picoLCD-256x64-Sideshow-CDROM-Bay Since it appears to come with OpenSource drivers. Based on lcd4linux, which I can compile with marginal massaging. Has anyone run this successfully with osol? It appears to handle mrtg directly, so I should be able to graph a whole load of ZFS data. Has someone already been down this road too? -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on 32 bit?
casper@sun.com wrote: It's true for most of the Intel Atom family (Zxxx and Nxxx but not the 230 and 330 as those are 64 bit) Those are new systems. Casper ___ I've actually just started to build my home raid using the Atom 330 (D945GCLF2): Status of virtual processor 0 as of: 06/17/2009 16:25:55 on-line since 09/17/2008 14:32:04. The i386 processor operates at 1600 MHz, and has an i387 compatible floating point processor. Status of virtual processor 1 as of: 06/17/2009 16:25:55 on-line since 09/17/2008 14:32:24. The i386 processor operates at 1600 MHz, and has an i387 compatible floating point processor. Status of virtual processor 2 as of: 06/17/2009 16:25:55 on-line since 09/17/2008 14:32:24. The i386 processor operates at 1600 MHz, and has an i387 compatible floating point processor. Status of virtual processor 3 as of: 06/17/2009 16:25:55 on-line since 09/17/2008 14:32:26. The i386 processor operates at 1600 MHz, and has an i387 compatible floating point processor. and booted 64 bit just fine. (I thought uname -a showed that, but apparently it does not). The only annoyance is that the onboard ICH7 is the $27c0, and not the $27c1 (with ahci mode for hot-swapping). But I always were planning on adding a SATA PCI card since I need more than 2 HDDs. But to stay on-topic, it sounds like Richard Elling summed it up nicely, which is something Richard is really good at. Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can the new consumer NAS devices run OpenSolaris?
It's not what you want, but I use fullsized PC located at balcony and don't worries about heat, vibration and noise. AFAIK, this is the cheapest, simplest and quickest way to build custom server. I ever know people who made 19 rack with set of servers on balcony also. =) And offcause I don't understand what the strong neccesary to hold this HDD-server near you. Are you going to swap HDD every day? Or do you like to see it while working? %-) 1000BaseTX standart allow up to 105 meters distance, remember it ;-) I have played with the idea of using the balcony. But in summer, it hits 40C+ for a couple of months, so if they were in a rack or similar storage it would get even hotter. I would also have to have AC for it. Inside, we already have AC on :) I wanted more hands-off really. Just a little box plugged in, replace the HDDs when the red light comes on, the rest is automatic. But of course, at the same time, it is MY data, so I'd rather it was using ZFS and so on. The Thecus, and QNAP, raids both use Intel chipsets. I am curious if I picked up an empty box; 2nd hand for next-to-nothing, if I couldn't re-flash it with osol, or eon, or freenas. -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zfs send speed. Was: User quota design discussion..
I changed to try zfs send on a UFS on zvolume as well: received 92.9GB stream in 2354 seconds (40.4MB/sec) Still fast enough to use. I have yet to get around to trying something considerably larger in size. Lund Jorgen Lundman wrote: So you recommend I also do speed test on larger volumes? The test data I had on the b114 server was only 90GB. Previous tests included 500G ufs on zvol etc. It is just it will take 4 days to send it to the b114 server to start with ;) (From Sol10 servers). Lund Dirk Wriedt wrote: Jorgen, what is the size of the sending zfs? I thought replication speed depends on the size of the sending fs, too not only size of the snapshot being sent. Regards Dirk --On Freitag, Mai 22, 2009 19:19:34 +0900 Jorgen Lundman lund...@gmo.jp wrote: Sorry, yes. It is straight; # time zfs send zpool1/leroy_c...@speedtest | nc 172.20.12.232 3001 real19m48.199s # /var/tmp/nc -l -p 3001 -vvv | time zfs recv -v zpool1/le...@speedtest received 82.3GB stream in 1195 seconds (70.5MB/sec) Sending is osol-b114. Receiver is Solaris 10 10/08 When we tested Solaris 10 10/08 - Solaris 10 10/08 these were the results; zfs send | nc | zfs recv - 1 MB/s tar -cvf /zpool/leroy | nc | tar -xvf - - 2.5 MB/s ufsdump | nc | ufsrestore- 5.0 MB/s So, none of those solutions was usable with regular Sol 10. Note most our volumes are ufs in zvol, but even zfs volumes were slow. Someone else had mentioned the speed was fixed in an earlier release, I had not had a chance to upgrade. But since we wanted to try zfs user-quotas, I finally had the chance. Lund Brent Jones wrote: On Thu, May 21, 2009 at 10:17 PM, Jorgen Lundman lund...@gmo.jp wrote: To finally close my quest. I tested zfs send in osol-b114 version: received 82.3GB stream in 1195 seconds (70.5MB/sec) Yeeaahh! That makes it completely usable! Just need to change our support contract to allow us to run b114 and we're set! :) Thanks, Lund Jorgen Lundman wrote: We finally managed to upgrade the production x4500s to Sol 10 10/08 (unrelated to this) but with the hope that it would also make zfs send usable. Exactly how does build 105 translate to Solaris 10 10/08? My current speed test has sent 34Gb in 24 hours, which isn't great. Perhaps the next version of Solaris 10 will have the improvements. 1 Robert Milkowski wrote: Hello Jorgen, If you look at the list archives you will see that it made a huge difference for some people including me. Now I'm easily able to saturate GbE linke while zfs send|recv'ing. Since build 105 it should be *MUCH* for faster. -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Can you give any details about your data set, what you piped zfs send/receive through (SSH?), hardware/network, etc? I'm envious of your speeds! -- Dirk Wriedt, dirk.wri...@sun.com, Sun Microsystems GmbH Systemingenieur Strategic Accounts Nagelsweg 55, 20097 Hamburg, Germany Tel.: +49-40-251523-132 Fax: +49-40-251523-425 Mobile: +49 172 848 4166 Never been afraid of chances I been takin' - Joan Jett Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Haering -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zfs send speed. Was: User quota design discussion..
So you recommend I also do speed test on larger volumes? The test data I had on the b114 server was only 90GB. Previous tests included 500G ufs on zvol etc. It is just it will take 4 days to send it to the b114 server to start with ;) (From Sol10 servers). Lund Dirk Wriedt wrote: Jorgen, what is the size of the sending zfs? I thought replication speed depends on the size of the sending fs, too not only size of the snapshot being sent. Regards Dirk --On Freitag, Mai 22, 2009 19:19:34 +0900 Jorgen Lundman lund...@gmo.jp wrote: Sorry, yes. It is straight; # time zfs send zpool1/leroy_c...@speedtest | nc 172.20.12.232 3001 real19m48.199s # /var/tmp/nc -l -p 3001 -vvv | time zfs recv -v zpool1/le...@speedtest received 82.3GB stream in 1195 seconds (70.5MB/sec) Sending is osol-b114. Receiver is Solaris 10 10/08 When we tested Solaris 10 10/08 - Solaris 10 10/08 these were the results; zfs send | nc | zfs recv - 1 MB/s tar -cvf /zpool/leroy | nc | tar -xvf - - 2.5 MB/s ufsdump | nc | ufsrestore- 5.0 MB/s So, none of those solutions was usable with regular Sol 10. Note most our volumes are ufs in zvol, but even zfs volumes were slow. Someone else had mentioned the speed was fixed in an earlier release, I had not had a chance to upgrade. But since we wanted to try zfs user-quotas, I finally had the chance. Lund Brent Jones wrote: On Thu, May 21, 2009 at 10:17 PM, Jorgen Lundman lund...@gmo.jp wrote: To finally close my quest. I tested zfs send in osol-b114 version: received 82.3GB stream in 1195 seconds (70.5MB/sec) Yeeaahh! That makes it completely usable! Just need to change our support contract to allow us to run b114 and we're set! :) Thanks, Lund Jorgen Lundman wrote: We finally managed to upgrade the production x4500s to Sol 10 10/08 (unrelated to this) but with the hope that it would also make zfs send usable. Exactly how does build 105 translate to Solaris 10 10/08? My current speed test has sent 34Gb in 24 hours, which isn't great. Perhaps the next version of Solaris 10 will have the improvements. 1 Robert Milkowski wrote: Hello Jorgen, If you look at the list archives you will see that it made a huge difference for some people including me. Now I'm easily able to saturate GbE linke while zfs send|recv'ing. Since build 105 it should be *MUCH* for faster. -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Can you give any details about your data set, what you piped zfs send/receive through (SSH?), hardware/network, etc? I'm envious of your speeds! -- Dirk Wriedt, dirk.wri...@sun.com, Sun Microsystems GmbH Systemingenieur Strategic Accounts Nagelsweg 55, 20097 Hamburg, Germany Tel.: +49-40-251523-132 Fax: +49-40-251523-425 Mobile: +49 172 848 4166 Never been afraid of chances I been takin' - Joan Jett Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Haering -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zfs send speed. Was: User quota design discussion..
Sorry, yes. It is straight; # time zfs send zpool1/leroy_c...@speedtest | nc 172.20.12.232 3001 real19m48.199s # /var/tmp/nc -l -p 3001 -vvv | time zfs recv -v zpool1/le...@speedtest received 82.3GB stream in 1195 seconds (70.5MB/sec) Sending is osol-b114. Receiver is Solaris 10 10/08 When we tested Solaris 10 10/08 - Solaris 10 10/08 these were the results; zfs send | nc | zfs recv - 1 MB/s tar -cvf /zpool/leroy | nc | tar -xvf - - 2.5 MB/s ufsdump | nc | ufsrestore- 5.0 MB/s So, none of those solutions was usable with regular Sol 10. Note most our volumes are ufs in zvol, but even zfs volumes were slow. Someone else had mentioned the speed was fixed in an earlier release, I had not had a chance to upgrade. But since we wanted to try zfs user-quotas, I finally had the chance. Lund Brent Jones wrote: On Thu, May 21, 2009 at 10:17 PM, Jorgen Lundman lund...@gmo.jp wrote: To finally close my quest. I tested zfs send in osol-b114 version: received 82.3GB stream in 1195 seconds (70.5MB/sec) Yeeaahh! That makes it completely usable! Just need to change our support contract to allow us to run b114 and we're set! :) Thanks, Lund Jorgen Lundman wrote: We finally managed to upgrade the production x4500s to Sol 10 10/08 (unrelated to this) but with the hope that it would also make zfs send usable. Exactly how does build 105 translate to Solaris 10 10/08? My current speed test has sent 34Gb in 24 hours, which isn't great. Perhaps the next version of Solaris 10 will have the improvements. 1 Robert Milkowski wrote: Hello Jorgen, If you look at the list archives you will see that it made a huge difference for some people including me. Now I'm easily able to saturate GbE linke while zfs send|recv'ing. Since build 105 it should be *MUCH* for faster. -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Can you give any details about your data set, what you piped zfs send/receive through (SSH?), hardware/network, etc? I'm envious of your speeds! -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Replacing HDD with larger HDD..
What is the current answer regarding replacing HDDs in a raidz, one at a time, with a larger HDD? The Best-Practises-Wiki seems to suggest it is possible (but perhaps just for mirror, not raidz?) I am currently running osol-b114. I did this test with data files to simulate this situation; # mkfile 1G disk0[12345] -rw--T 1 root root 1073741824 May 23 09:19 disk01 -rw--T 1 root root 1073741824 May 23 09:19 disk02 -rw--T 1 root root 1073741824 May 23 09:20 disk03 -rw--T 1 root root 1073741824 May 23 09:20 disk04 -rw--T 1 root root 1073741824 May 23 09:20 disk05 # zpool create grow raidz /var/tmp/disk01 /var/tmp/disk02 /var/tmp/disk03 /var/tmp/disk04 /var/tmp/disk05 # zpool list NAME SIZE USED AVAILCAP HEALTH ALTROOT grow 4.97G 138K 4.97G 0% ONLINE - # zfs create -o compression=on -o atime=off grow/fs1 # zfs list NAME USED AVAIL REFER MOUNTPOINT grow 153K 3.91G 35.1K /grow grow/fs1 33.6K 3.91G 33.6K /grow/fs1 # zpool status grow NAME STATE READ WRITE CKSUM grow ONLINE 0 0 0 raidz1 ONLINE 0 0 0 /var/tmp/disk01 ONLINE 0 0 0 /var/tmp/disk02 ONLINE 0 0 0 /var/tmp/disk03 ONLINE 0 0 0 /var/tmp/disk04 ONLINE 0 0 0 /var/tmp/disk05 ONLINE 0 0 0 - That is our starting position, raidz using 5 1GB disks, giving us a total 3.91G file-system. Now to replace each one at a time with a 2GB disk. -rw--T 1 root root 2147483648 May 23 09:36 bigger_disk01 -rw--T 1 root root 2147483648 May 23 09:37 bigger_disk02 -rw--T 1 root root 2147483648 May 23 09:40 bigger_disk03 -rw--T 1 root root 2147483648 May 23 09:40 bigger_disk04 -rw--T 1 root root 2147483648 May 23 09:41 bigger_disk05 # zpool offline grow /var/tmp/disk01 # zpool replace grow /var/tmp/disk01 /var/tmp/bigger_disk01 # zpool status grow pool: grow state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Sat May 23 09:43:51 2009 config: NAMESTATE READ WRITE CKSUM growONLINE 0 0 0 raidz1ONLINE 0 0 0 /var/tmp/bigger_disk01 ONLINE 0 0 0 1.04M resilvered /var/tmp/disk02 ONLINE 0 0 0 /var/tmp/disk03 ONLINE 0 0 0 /var/tmp/disk04 ONLINE 0 0 0 /var/tmp/disk05 ONLINE 0 0 0 Do the same for all 5 disks # zpool status grow scrub: resilver completed after 0h0m with 0 errors on Sat May 23 09:46:28 2009 config: NAMESTATE READ WRITE CKSUM growONLINE 0 0 0 raidz1ONLINE 0 0 0 /var/tmp/bigger_disk01 ONLINE 0 0 0 /var/tmp/bigger_disk02 ONLINE 0 0 0 /var/tmp/bigger_disk03 ONLINE 0 0 0 /var/tmp/bigger_disk04 ONLINE 0 0 0 /var/tmp/bigger_disk05 ONLINE 0 0 0 1.04M resilvered I was somewhat it just be magical here, but unfortunately; # zpool list NAME SIZE USED AVAILCAP HEALTH ALTROOT grow 4.97G 5.35M 4.96G 0% ONLINE - It is still the same size. I would expect it to go to 9G. - I did a few commands to see if you can tell it to make it happen. Scrub, zfs unmount/mount, zpool upgrade, etc. No difference. Then something peculiar happened. I tried to export it, and import it to see if it helped; # zpool export grow # zpool import grow cannot import 'grow': no such pool available And alas, grow is completely gone, and no amount of import would see it. Oh well. -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing HDD with larger HDD..
Rob Logan wrote: you meant to type zpool import -d /var/tmp grow Bah - of course, I can not just expect zpool to know what random directory to search. You Sir, are a genius. Works like a charm, and thank you. Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zfs send speed. Was: User quota design discussion..
To finally close my quest. I tested zfs send in osol-b114 version: received 82.3GB stream in 1195 seconds (70.5MB/sec) Yeeaahh! That makes it completely usable! Just need to change our support contract to allow us to run b114 and we're set! :) Thanks, Lund Jorgen Lundman wrote: We finally managed to upgrade the production x4500s to Sol 10 10/08 (unrelated to this) but with the hope that it would also make zfs send usable. Exactly how does build 105 translate to Solaris 10 10/08? My current speed test has sent 34Gb in 24 hours, which isn't great. Perhaps the next version of Solaris 10 will have the improvements. Robert Milkowski wrote: Hello Jorgen, If you look at the list archives you will see that it made a huge difference for some people including me. Now I'm easily able to saturate GbE linke while zfs send|recv'ing. Since build 105 it should be *MUCH* for faster. -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS userquota groupquota test
I have been playing around with osol-nv-b114 version, and the ZFS user and group quotas. First of all, it is fantastic. Thank you all! (Sun, Ahrens and anyone else involved). I'm currently copying over one of the smaller user areas, and setting up their quotas, so I have yet to start large scale testing. But the initial work is very promising. (Just 90G data, 341694 accounts) Using userquota@, userused@ and userspace commands are easy to pick up. With a test account with 50M quota, and a while [ 1 ] script copying a 5M file, it reaches about 120M before the user is stopped (as expected). The lazy-update-quota is not a problem for us, and less of a problem the more quota the user has (50M is a bit low). I was unable to get ZFS quota to work with rquota. (Ie, NFS mount the volume on another server, and issue quota 1234. It returns nothing). I assume rquota is just not implemented, not a problem for us. perl cpan module Quota does not implement ZFS quotas. :) -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS userquota groupquota test
Matthew Ahrens wrote: Thanks for the feedback! Thank you for the work, it sure is great! This should work, at least on Solaris clients. Perhaps you can only request information about yourself from the client? Odd, but I just assumed it wouldn't work and didn't check further. But telnet/rquota wasn't running. But I do find that, from a server mounting the NFS volume: # quota -v 1234 Disk quotas for (no account) (uid 1234): Filesystem usage quota limittimeleft files quota limit timeleft /export/leroy 55409 1048576 1048576 0 0 0 However, on the x4500 server itself: # quota -v 1234 Disk quotas for (no account) (uid 1234): Filesystem usage quota limittimeleft files quota limit timeleft Of course I should use zfs get userused on the server, but that is probably what confused the situation. Perhaps something to do with that mount doesn't think it is mounted with quota when local. I could try mountpoint=legacy and explicitly list rq when mounting maybe . But we don't need it to work, it was just different from legacy behaviour. :) Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS userquota groupquota test
Oh I forgot the more important question. Importing all the user quota settings; Currently as a long file of zfs set commands, which is taking a really long time. For example, yesterday's import is still running. Are there bulk-import solutions? Like zfs set -f file.txt or similar? If not, I could potentially use zfs ioctls perhaps to write my own bulk import program? Large imports are rare, but I was just curious if there was a better way to issue large amounts of zfs set commands. Jorgen Lundman wrote: Matthew Ahrens wrote: Thanks for the feedback! Thank you for the work, it sure is great! This should work, at least on Solaris clients. Perhaps you can only request information about yourself from the client? Odd, but I just assumed it wouldn't work and didn't check further. But telnet/rquota wasn't running. But I do find that, from a server mounting the NFS volume: # quota -v 1234 Disk quotas for (no account) (uid 1234): Filesystem usage quota limittimeleft files quota limit timeleft /export/leroy 55409 1048576 1048576 0 0 0 However, on the x4500 server itself: # quota -v 1234 Disk quotas for (no account) (uid 1234): Filesystem usage quota limittimeleft files quota limit timeleft Of course I should use zfs get userused on the server, but that is probably what confused the situation. Perhaps something to do with that mount doesn't think it is mounted with quota when local. I could try mountpoint=legacy and explicitly list rq when mounting maybe . But we don't need it to work, it was just different from legacy behaviour. :) Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zfs and b114 version
I tried LUpdate 3 times with same result, burnt the ISO and installed the old fashioned way, and it boots fine. Jorgen Lundman wrote: Most annoying. If su.static really had been static I would be able to figure out what goes wrong. When I boot into miniroot/failsafe it works just fine, including if I set crle to use only libraries in /a/lib:/a/usr/lib (and 64 bit). So startd not launching must be some other file/permission on disk somewhere, but without single user shell, it isn't easy to see why. There is nothing in /var/{adm,log}. (But then, syslogd has not started yet). svc.startd.log merely states: May 19 14:15:23/1: restarting after interruption Perhaps it will work better if I just install from CD instead of using LiveUpdate Jorgen Lundman wrote: I used LUpdate to create a b114 BE on the spare X4540, and booted it, but alas, I get the following message on boot: SunOS Release 5.11 Version snv_114 64-bit Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. /kernel/drv/amd64/pcic: undefined symbol 'cardbus_can_suspend' WARNING: mod_load: cannot load module 'pcic' libscf.c:3257: scf_handle_bind() failed with unexpected error 1017. Aborting. May 19 10:16:34 svc.startd[11]: restarting after interruption Entering System Maintenance Mode ld.so.1: su.static: fatal: relocation error: file /sbin/su.static: symbol audit_su_init_info: referenced symbol not found May 19 10:16:51 svc.startd[19]: restarting after interruption libscf.c:3257: scf_handle_bind() failed with unexpected error 1017. Aborting. I will probably have to wait a little bit longer after all :) Lund Jorgen Lundman wrote: The website has not been updated yet to reflect its availability (thus it may not be official yet), but you can get SXCE b114 now from https://cds.sun.com/is-bin/INTERSHOP.enfinity/WFS/CDS-CDS_SMI-Site/en_US/-/USD/viewproductdetail-start?productref=sol-express_b114-full-x86-sp-...@cds-cds_smi I don't mind learning something new, but that's even faster! I will try that image and work on my kernel building projects a little later... Thanks! -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zfs and b114 version
I used LUpdate to create a b114 BE on the spare X4540, and booted it, but alas, I get the following message on boot: SunOS Release 5.11 Version snv_114 64-bit Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. /kernel/drv/amd64/pcic: undefined symbol 'cardbus_can_suspend' WARNING: mod_load: cannot load module 'pcic' libscf.c:3257: scf_handle_bind() failed with unexpected error 1017. Aborting. May 19 10:16:34 svc.startd[11]: restarting after interruption Entering System Maintenance Mode ld.so.1: su.static: fatal: relocation error: file /sbin/su.static: symbol audit_su_init_info: referenced symbol not found May 19 10:16:51 svc.startd[19]: restarting after interruption libscf.c:3257: scf_handle_bind() failed with unexpected error 1017. Aborting. I will probably have to wait a little bit longer after all :) Lund Jorgen Lundman wrote: The website has not been updated yet to reflect its availability (thus it may not be official yet), but you can get SXCE b114 now from https://cds.sun.com/is-bin/INTERSHOP.enfinity/WFS/CDS-CDS_SMI-Site/en_US/-/USD/viewproductdetail-start?productref=sol-express_b114-full-x86-sp-...@cds-cds_smi I don't mind learning something new, but that's even faster! I will try that image and work on my kernel building projects a little later... Thanks! -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Zfs and b114 version
http://dlc.sun.com/osol/on/downloads/b114/ This URL makes me think that if I just sit down and figure out how to compile OpenSolaris, I can try b114 now^h^h^h eventually ? I am really eager to try out the new quota support.. has someone already tried compiling it perhaps? How complicated is compiling osol compared to, say, NetBSD/FreeBSD, Linux etc ? (IRIX and its quickstarting??) -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can the new consumer NAS devices run OpenSolaris?
Re-surfacing an old thread. I was wondering myself if there are any home-use commercial NAS devices with zfs. I did find that there is Thecus 7700. But, it appears to come with Linux, and use ZFS in FUSE, but I (perhaps unjustly) don't feel comfortable with :) Perhaps we will start to see more home NAS devices with zfs options, or at least to be able to run EON ? Joe S wrote: In the last few weeks, I've seen a number of new NAS devices released from companies like HP, QNAP, VIA, Lacie, Buffalo, Iomega, Cisco/Linksys, etc. Most of these are powered by Intel Celeron, Intel Atom, AMD Sempron, Marvell Orion, or Via C7 chips. I've also noticed that most allow a maximum of 1 or 2 GB of RAM. Is it likely that any of these will run OpenSolaris? Has anyone else tried? http://www.via.com.tw/en/products/embedded/nsd7800/ http://www.hp.com/united-states/digitalentertainment/mediasmart/serverdemo/index-noflash.html http://www.qnap.com/pro_detail_feature.asp?p_id=108 I prefer one of these instead of the huge PC I have at home. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zfs send speed. Was: User quota design discussion..
We finally managed to upgrade the production x4500s to Sol 10 10/08 (unrelated to this) but with the hope that it would also make zfs send usable. Exactly how does build 105 translate to Solaris 10 10/08? My current speed test has sent 34Gb in 24 hours, which isn't great. Perhaps the next version of Solaris 10 will have the improvements. Robert Milkowski wrote: Hello Jorgen, If you look at the list archives you will see that it made a huge difference for some people including me. Now I'm easily able to saturate GbE linke while zfs send|recv'ing. Since build 105 it should be *MUCH* for faster. -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] User quota design discussion..
Sorry, did not mean it as a complaints, it just has been for us. But if it has been made faster, that would be excellent. ZFS send is very powerful. Lund Robert Milkowski wrote: Hello Jorgen, Friday, March 13, 2009, 1:14:12 AM, you wrote: JL That is a good point, I had not even planned to support quotas for ZFS JL send, but consider a rescan to be the answer. We don't ZFS send very JL often as it is far too slow. Since build 105 it should be *MUCH* for faster. -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] User quota design discussion..
Bob Friesenhahn wrote: In order for this to work, ZFS data blocks need to somehow be associated with a POSIX user ID. To start with, the ZFS POSIX layer is implemented on top of a non-POSIX Layer which does not need to know about POSIX user IDs. ZFS also supports snapshots and clones. This I did not know, but now that you point it out, this would be the right way to design it. So the advantage of requiring less ZFS integration is no longer the case. Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] User quota design discussion..
Eric Schrock wrote: Note that: 6501037 want user/group quotas on ZFS Is already committed to be fixed in build 113 (i.e. in the next month). - Eric Wow, that would be fantastic. We have the Sun vendors camped out at the data center trying to apply fresh patches. I believe 6798540 fixed the largest issue but it would be desirable to be able to use just ZFS. Is this a project needing donations? I see your address is at Sun.com, and we already have 9 x4500s, but maybe you need some pocky, asse, collon or pocari sweat... Lundy [1] BugID:6798540 3-way deadlock happens in ufs filesystem on zvol when writng ufs log -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] User quota design discussion..
As it turns out, I'm working on zfs user quotas presently, and expect to integrate in about a month. My implementation is in-kernel, integrated with the rest of ZFS, and does not have the drawbacks you mention below. I merely suggested my design as it may have been something I _could_ have implemented, as it required little ZFS knowledge. (Adding hooks is usually easier). But naturally that has already been shown not to be the case. A proper implementation is always going to be much more desirable :) Good, that's the behavior that user quotas will have -- delayed enforcement. There probably are situations where precision is required, or perhaps historical reasons, but for us a delayed enforcement may even be better. Perhaps it would be better for the delivery of an email message that goes over the quota, to be allowed to complete writing the entire message. Than it is to abort a write() call somewhere in the middle, and return failures all the way back to generating a bounce message. Maybe.. can't say I have thought about it. My implementation does not have this drawback. Note that you would need to use the recovery mechanism in the case of a system crash / power loss as well. Adding potentially hours to the crash recovery time is not acceptable. Great! Will there be any particular limits on how many uids, or size of uids in your implementation? UFS generally does not, but I did note that if uid go over 1000 it flips out and changes the quotas file to 128GB in size. Not to mention that this information needs to get stored somewhere, and dealt with when you zfs send the fs to another system. That is a good point, I had not even planned to support quotas for ZFS send, but consider a rescan to be the answer. We don't ZFS send very often as it is far too slow. Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] User quota design discussion..
In the style of a discussion over a beverage, and talking about user-quotas on ZFS, I recently pondered a design for implementing user quotas on ZFS after having far too little sleep. It is probably nothing new, but I would be curious what you experts think of the feasibility of implementing such a system and/or whether or not it would even realistically work. I'm not suggesting that someone should do the work, or even that I will, but rather in the interest of chatting about it. Feel free to ridicule me as required! :) Thoughts: Here at work we would like to have user quotas based on uid (and presumably gid) to be able to fully replace the NetApps we run. Current ZFS are not good enough for our situation. We simply can not mount 500,000 file-systems on all the NFS clients. Nor do all servers we run support mirror-mounts. Nor do auto-mount see newly created directories without a full remount. Current UFS-style-user-quotas are very exact. To the byte even. We do not need this precision. If a user has 50MB of quota, and they are able to reach 51MB usage, then that is acceptable to us. Especially since they have to go under 50MB to be able to write new data, anyway. Instead of having complicated code in the kernel layer, slowing down the file-system with locking and semaphores (and perhaps avoiding learning indepth ZFS code?), I was wondering if a more simplistic setup could be designed, that would still be acceptable. I will use the word 'acceptable' a lot. Sorry. My thoughts are that the ZFS file-system will simply write a 'transaction log' on a pipe. By transaction log I mean uid, gid and 'byte count changed'. And by pipe I don't necessarily mean pipe(2), but it could be a fifo, pipe or socket. But currently I'm thinking '/dev/quota' style. User-land will then have a daemon, whether or not it is one daemon per file-system or really just one daemon does not matter. This process will open '/dev/quota' and empty the transaction log entries constantly. Take the uid,gid entries and update the byte-count in its database. How we store this database is up to us, but since it is in user-land it should have more flexibility, and is not as critical to be fast as it would have to be in kernel. The daemon process can also grow in number of threads as demand increases. Once a user's quota reaches the limit (note here that /the/ call to write() that goes over the limit will succeed, and probably a couple more after. This is acceptable) the process will blacklist the uid in kernel. Future calls to creat/open(CREAT)/write/(insert list of calls) will be denied. Naturally calls to unlink/read etc should still succeed. If the uid goes under the limit, the uid black-listing will be removed. If the user-land process crashes or dies, for whatever reason, the buffer of the pipe will grow in the kernel. If the daemon is restarted sufficiently quickly, all is well, it merely needs to catch up. If the pipe does ever get full and items have to be discarded, a full-scan will be required of the file-system. Since even with UFS quotas we need to occasionally run 'quotacheck', it would seem this too, is acceptable (if undesirable). If you have no daemon process running at all, you have no quotas at all. But the same can be said about quite a few daemons. The administrators need to adjust their usage. I can see a complication with doing a rescan. How could this be done efficiently? I don't know if there is a neat way to make this happen internally to ZFS, but from a user-land only point of view, perhaps a snapshot could be created (synchronised with the /dev/quota pipe reading?) and start a scan on the snapshot, while still processing kernel log. Once the scan is complete, merge the two sets. Advantages are that only small hooks are required in ZFS. The byte updates, and the blacklist with checks for being blacklisted. Disadvantages are that it is loss of precision, and possibly slower rescans? Sanity? But I do not really know the internals of ZFS, so I might be completely wrong, and everyone is laughing already. Discuss? Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Introducing zilstat
Interesting, but what does it mean :) The x4500 for mail (NFS vers=3 on ufs on zpool with quotas): # ./zilstat.ksh N-Bytes N-Bytes/s N-Max-Bytes/sB-Bytes B-Bytes/s B-Max-Bytes/s 376720 376720 376720128614412861441286144 419608 419608 419608136806413680641368064 555256 555256 555256173260817326081732608 538808 538808 538808167936016793601679360 626048 626048 626048177356817735681773568 753824 753824 753824210534421053442105344 652632 652632 652632171622417162241716224 Fairly constantly between 1-2MB/s. That doesn't sound too bad though. It's only got 400 nfsd threads at the moment, but peaks at 1024. Incidentally, what is the highest recommended nfsd_threads for a x4500 anyway? Lund Marion Hakanson wrote: The zilstat tool is very helpful, thanks! I tried it on an X4500 NFS server, while extracting a 14MB tar archive, both via an NFS client, and locally on the X4500 itself. Over NFS, said extract took ~2 minutes, and showed peaks of 4MB/sec buffer-bytes going through the ZIL. When run locally on the X4500, the extract took about 1 second, with zilstat showing all zeroes. I wonder if this is a case where that ZIL bypass kicks in for 32K writes, in the local tar extraction. Does zilstat's underlying dtrace include these bypass-writes in the totals it displays? I think if it's possible to get stats on this bypassed data, I'd like to see it as another column (or set of columns) in the zilstat output. Regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Introducing zilstat
Richard Elling wrote: # ./zilstat.ksh N-Bytes N-Bytes/s N-Max-Bytes/sB-Bytes B-Bytes/s B-Max-Bytes/s 376720 376720 376720128614412861441286144 419608 419608 419608136806413680641368064 555256 555256 555256173260817326081732608 538808 538808 538808167936016793601679360 626048 626048 626048177356817735681773568 753824 753824 753824210534421053442105344 652632 652632 652632171622417162241716224 Fairly constantly between 1-2MB/s. That doesn't sound too bad though. I think your workload would benefit from a fast, separate log device. Interesting. Today is the first I've heard about it.. one of the x4500 is really really slow, something like 15 seconds to do an unlink. But I assumed it was because the ufs inside zvol is _really_ bloated. Maybe we need to experiment with it on the test x4500. Highest recommended is what you need to get the job done. For the most part, the defaults work well. But you can experiment with them and see if you can get better results. It came shipped with 16. And I'm sorry but 16 didn't cut it at all :) We set it at 1024 as it was the highest number I found via Google. Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing HDD in x4500
I've been told we got a BugID: 3-way deadlock happens in ufs filesystem on zvol when writing ufs log but I can not view the BugID yet (presumably due to my accounts weak credentials) Perhaps it isn't something we do wrong, that would be a nice change. Lund Jorgen Lundman wrote: I assume you've changed the failmode to continue already? http://prefetch.net/blog/index.php/2008/03/01/configuring-zfs-to-gracefully-deal-with-failures/ This appears to be new to 10/08, so that is another vote to upgrade. Also interesting that the default is wait, since it almost behaves like it. Not sure why it would block zpool, zfs and df commands as well though? Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing HDD in x4500
Thanks for your reply, While the savecore is working its way up the chain to (hopefully) Sun, the vendor asked us not to use it, so we moved x4500-02 to use x4500-04 and x4500-05. But perhaps moving to Sol 10 10/08 on x4500-02 when fixed is the way to go. The savecore had the usual info, that everything is blocked waiting on locks: 601* threads trying to get a mutex (598 user, 3 kernel) longest sleeping 10 minutes 13.52 seconds earlier 115* threads trying to get an rwlock (115 user, 0 kernel) 1678 total threads in allthreads list (1231 user, 447 kernel) 10 thread_reapcnt 0 lwp_reapcnt 1688 nthread thread pri pctcpu idle PID wchan command 0xfe8000137c80 60 0.000 -9m44.88s 0 0xfe84d816cdc8 sched 0xfe800092cc80 60 0.000 -9m44.52s 0 0xc03c6538 sched 0xfe8527458b40 59 0.005 -1m41.38s 1217 0xb02339e0 /usr/lib/nfs/rquotad 0xfe8527b534e0 60 0.000 -5m4.79s 402 0xfe84d816cdc8 /usr/lib/nfs/lockd 0xfe852578f460 60 0.000 -4m59.79s 402 0xc0633fc8 /usr/lib/nfs/lockd 0xfe8532ad47a0 60 0.000 -10m4.40s 623 0xfe84bde48598 /usr/lib/nfs/nfsd 0xfe8532ad3d80 60 0.000 -10m9.10s 623 0xfe84d816ced8 /usr/lib/nfs/nfsd 0xfe8532ad3360 60 0.000 -10m3.77s 623 0xfe84d816cde0 /usr/lib/nfs/nfsd 0xfe85341e9100 60 0.000 -10m6.85s 623 0xfe84bde48428 /usr/lib/nfs/nfsd 0xfe85341e8a40 60 0.000 -10m4.76s 623 0xfe84d816ced8 /usr/lib/nfs/nfsd SolarisCAT(vmcore.0/10X) tlist sobj locks | grep nfsd | wc -l 680 scl_writer = 0xfe8000185c80 - locking thread thread 0xfe8000185c80 kernel thread: 0xfe8000185c80 PID: 0 cmd: sched t_wchan: 0xfbc8200a sobj: condition var (from genunix:bflush+0x4d) t_procp: 0xfbc22dc0(proc_sched) p_as: 0xfbc24a20(kas) zone: global t_stk: 0xfe8000185c80 sp: 0xfe8000185aa0 t_stkbase: 0xfe8000181000 t_pri: 99(SYS) pctcpu: 0.00 t_lwp: 0x0 psrset: 0 last CPU: 0 idle: 44943 ticks (7 minutes 29.43 seconds) start: Tue Jan 27 23:44:21 2009 age: 674 seconds (11 minutes 14 seconds) tstate: TS_SLEEP - awaiting an event tflg: T_TALLOCSTK - thread structure allocated from stk tpflg: none set tsched: TS_LOAD - thread is in memory TS_DONT_SWAP - thread/LWP should not be swapped pflag: SSYS - system resident process pc: 0xfb83616f unix:_resume_from_idle+0xf8 resume_return startpc: 0xeff889e0 zfs:spa_async_thread+0x0 unix:_resume_from_idle+0xf8 resume_return() unix:swtch+0x12a() genunix:cv_wait+0x68() genunix:bflush+0x4d() genunix:ldi_close+0xbe() zfs:vdev_disk_close+0x6a() zfs:vdev_close+0x13() zfs:vdev_raidz_close+0x26() zfs:vdev_close+0x13() zfs:vdev_reopen+0x1d() zfs:spa_async_reopen+0x5f() zfs:spa_async_thread+0xc8() unix:thread_start+0x8() -- end of kernel thread's stack -- Blake wrote: I'm not an authority, but on my 'vanilla' filer, using the same controller chipset as the thumper, I've been in really good shape since moving to zfs boot in 10/08 and doing 'zpool upgrade' and 'zfs upgrade' to all my mirrors (3 3-way). I'd been having similar troubles to yours in the past. My system is pretty puny next to yours, but it's been reliable now for slightly over a month. On Tue, Jan 27, 2009 at 12:19 AM, Jorgen Lundman lund...@gmo.jp wrote: The vendor wanted to come in and replace an HDD in the 2nd X4500, as it was constantly busy, and since our x4500 has always died miserably in the past when a HDD dies, they wanted to replace it before the HDD actually died. The usual was done, HDD replaced, resilvering started and ran for about 50 minutes. Then the system hung, same as always, all ZFS related commands would just hang and do nothing. System is otherwise fine and completely idle. The vendor for some reason decided to fsck root-fs, not sure why as it is mounted with logging, and also decided it would be best to do so from a CDRom boot. Anyway, that was 12 hours ago and the x4500 is still down. I think they have it at single-user prompt resilvering again. (I also noticed they'd decided to break the mirror of the root disks for some very strange reason). It still shows: raidz1 DEGRADED 0 0 0 c0t1d0ONLINE 0 0 0 replacing UNAVAIL 0 0 0 insufficient replicas c1t1d0s0/o OFFLINE 0 0 0 c1t1d0 UNAVAIL 0 0 0 cannot open So I am pretty sure it'll hang again sometime soon. What is interesting though is that this is on x4500-02, and all our previous troubles mailed to the list was regarding our first x4500. The hardware is all different, but identical. Solaris 10 5/08. Anyway, I think they want to boot CDrom
Re: [zfs-discuss] Replacing HDD in x4500
I assume you've changed the failmode to continue already? http://prefetch.net/blog/index.php/2008/03/01/configuring-zfs-to-gracefully-deal-with-failures/ This appears to be new to 10/08, so that is another vote to upgrade. Also interesting that the default is wait, since it almost behaves like it. Not sure why it would block zpool, zfs and df commands as well though? Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Replacing HDD in x4500
The vendor wanted to come in and replace an HDD in the 2nd X4500, as it was constantly busy, and since our x4500 has always died miserably in the past when a HDD dies, they wanted to replace it before the HDD actually died. The usual was done, HDD replaced, resilvering started and ran for about 50 minutes. Then the system hung, same as always, all ZFS related commands would just hang and do nothing. System is otherwise fine and completely idle. The vendor for some reason decided to fsck root-fs, not sure why as it is mounted with logging, and also decided it would be best to do so from a CDRom boot. Anyway, that was 12 hours ago and the x4500 is still down. I think they have it at single-user prompt resilvering again. (I also noticed they'd decided to break the mirror of the root disks for some very strange reason). It still shows: raidz1 DEGRADED 0 0 0 c0t1d0ONLINE 0 0 0 replacing UNAVAIL 0 0 0 insufficient replicas c1t1d0s0/o OFFLINE 0 0 0 c1t1d0 UNAVAIL 0 0 0 cannot open So I am pretty sure it'll hang again sometime soon. What is interesting though is that this is on x4500-02, and all our previous troubles mailed to the list was regarding our first x4500. The hardware is all different, but identical. Solaris 10 5/08. Anyway, I think they want to boot CDrom to fsck root again for some reason, but since customers have been without their mail for 12 hours, they can go a little longer, I guess. What I was really wondering, has there been any progress or patches regarding the system always hanging whenever a HDD dies (or is replaced it seems). It really is rather frustrating. Lund -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500 vs AVS ?
Sorry, I popped up to Hokkdaido for a holiday. I want to thank you all for the replies. I mentioned AVS as I thought it to do be the only product close to enabling us to do a (makeshift) fail-over setup. We have 5-6 ZFS filesystem, and 5-6 zvol with UFS (for quotas). To do zfs send snapshots every minute might perhaps be possible (just not very attractive), but if the script dies at any time, you need to resend the full volumes, this currently takes 5 days. (Even using nc). Since we are forced by vendor to run Sol10, it sounds like AVS is not an option for us. If we were interested in finding a method to replicate data to a 2nd x4500, what other options are there for us? We do not need instant updates, just someplace to fail-over to when the x4500 panics, or a HDD dies. (Which equals panic) It currently takes 2 hours to fsck the UFS volumes after a panic (and yes, they are logging; it is actually just the one UFS volume that always needs fsck). Vendor has mentioned VeritasVolumReplicator but I was under the impression that Veritas is a whole different set to zfs/zpool. Lund Jim Dunham wrote: On Sep 11, 2008, at 5:16 PM, A Darren Dunham wrote: On Thu, Sep 11, 2008 at 04:28:03PM -0400, Jim Dunham wrote: On Sep 11, 2008, at 11:19 AM, A Darren Dunham wrote: On Thu, Sep 11, 2008 at 10:33:00AM -0400, Jim Dunham wrote: The issue with any form of RAID 1, is that the instant a disk fails out of the RAID set, with the next write I/O to the remaining members of the RAID set, the failed disk (and its replica) are instantly out of sync. Does raidz fall into that category? Yes. The key reason is that as soon as ZFS (or other mirroring software) detects a disk failure in a RAID 1 set, it will stop writing to the failed disk, which also means it will also stop writing to the replica of the failed disk. From the point of view of the remote node, the replica of the failed disk is no longer being updated. Now if replication was stopped, or the primary node powered off or panicked, during the import of the ZFS storage pool on the secondary node, the replica of the failed disk must not be part of the ZFS storage pool as its data is stale. This happens automatically, since the ZFS metadata on the remaining disks have already given up on this member of the RAID set. Then I misunderstood what you were talking about. Why the restriction on RAID 1 for your statement? No restriction. I meant to say, RAID 1 or greater. Even for a mirror, the data is stale and it's removed from the active set. I thought you were talking about block parity run across columns... -- Darren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Jim Dunham Engineering Manager Storage Platform Software Group Sun Microsystems, Inc. work: 781-442-4042 cell: 603.724.2972 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Jorgen Lundman | [EMAIL PROTECTED] Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] x4500 vs AVS ?
If we get two x4500s, and look at AVS, would it be possible to: 1) Setup AVS to replicate zfs, and zvol (ufs) from 01 - 02 ? Supported by Sol 10 5/08 ? Assuming 1, if we setup a home-made IP fail-over so that; should 01 go down, all clients are redirected to 02. 2) Fail-back, are there methods in AVS to handle fail-back? Since 02 has been used, it will have newer/modified files, and will need to replicate backwards until synchronised, before fail-back can occur. We did ask our vendor, but we were just told that AVS does not support x4500. Lund -- Jorgen Lundman | [EMAIL PROTECTED] Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500 dead HDD, hung server, unable to boot.
So it does appear that it is zpool that hangs, possibly during resilvering (we lost a HDD at midnight, this what was started all this). After boot: x4500-02:~# zpool status -x pool: zpool1 state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 11.10% done, 2h11m to go config: NAME STATE READ WRITE CKSUM zpool1DEGRADED 0 0 0 raidz1 ONLINE 0 0 0 [snip] c7t3d0ONLINE 0 0 0 replacing UNAVAIL 0 0 0 insufficient replicas c8t3d0s0/o UNAVAIL 0 0 0 cannot open c8t3d0 UNAVAIL 0 0 0 cannot open raidz1 ONLINE 0 0 0 You can run zpool for about 4-5 minutes, then they start to hang. For example, I tried to issue; # zpool offline zpool1 c8t3d0 .. and the system stops z-responding. # mdb -k ::ps!grep pool R732722732662 0 0x4a004000 b92a8030 zpool b92a8030::walk thread|::findstack -v stack pointer for thread fe85285d07e0: fe800283fc40 [ fe800283fc40 _resume_from_idle+0xf8() ] fe800283fc70 swtch+0x12a() fe800283fc90 cv_wait+0x68() fe800283fcc0 spa_config_enter+0x50() fe800283fce0 spa_vdev_enter+0x2a() fe800283fd10 vdev_offline+0x29() fe800283fd40 zfs_ioc_vdev_offline+0x58() fe800283fd80 zfsdev_ioctl+0x13e() fe800283fd90 cdev_ioctl+0x1d() fe800283fdb0 spec_ioctl+0x50() fe800283fde0 fop_ioctl+0x25() fe800283fec0 ioctl+0xac() fe800283ff10 sys_syscall32+0x101() Similarly, nfs: ::ps!grep nfsd R548 1548548 1 0x4200 b92ad6d0 nfsd b92ad6d0::walk thread|::findstack -v stack pointer for thread 9af8e540: fe8001046cc0 [ fe8001046cc0 _resume_from_idle+0xf8() ] fe8001046cf0 swtch+0x12a() fe8001046d40 cv_wait_sig_swap_core+0x177() fe8001046d50 cv_wait_sig_swap+0xb() fe8001046da0 cv_waituntil_sig+0xd7() fe8001046e50 poll_common+0x420() fe8001046ec0 pollsys+0xbe() fe8001046f10 sys_syscall32+0x101() -- Jorgen Lundman | [EMAIL PROTECTED] Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500 dead HDD, hung server, unable to boot.
ok, so I tried installing 138053-02, and umounting/unsharing for the entire resilvering process, meanwhile, onsite support decided to replace the mainboard due to some reason (not that I was full of confidence here) ... and between us, it has actually been up for 2 hours, and has a clean zpool status. Going to get some sleep, and really hope it has been fixed. Thank you to everyone who helped. Lund Jorgen Lundman wrote: Jorgen Lundman wrote: Anyway, it has almost rebooted, so I need to go remount everything. Not that it wants to stay up for longer than ~20 mins, then hangs. In that all IO hangs, including nfsd. I thought this might have been related: http://sunsolve.sun.com/search/document.do?assetkey=1-66-233341-1 # /usr/X11/bin/scanpci | /usr/sfw/bin/ggrep -A1 vendor 0x11ab device 0x6081 pci bus 0x0001 cardnum 0x01 function 0x00: vendor 0x11ab device 0x6081 Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller But it claims resolved for our version: SunOS x4500-02.unix 5.10 Generic_127128-11 i86pc i386 i86pc Perhaps I should see if there are any recommended patches for Sol 10 5/08? Lund -- Jorgen Lundman | [EMAIL PROTECTED] Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] x4500 dead HDD, hung server, unable to boot.
SunOS x4500-02.unix 5.10 Generic_127128-11 i86pc i386 i86pc Admittedly we are not having much luck with the x4500s. This time it was the new x4500, running Solaris 10 5/08. Drive /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd30): stopped responding, and even after a hard reset, it would simply repeat retryable, reset, and fatal messages forever. So unable to login on console. Again we ended up with the problem of knowing which HDD that actually is broken. Turns out to be drive #40. (Has anyone got a map we can print? Since we couldn't boot it, any Unix commands needed to map are a bit useless, nor do we have a hd utility). That a HDD died in the first month of operation is understandable, but does it really have to take the whole server with it? Not to mention stop it from booting. Eventually the NOC staff guessed the correct drive from the blinking of LEDs (no LED was RED), and we were able to boot. Log outputs: Aug 11 08:47:59 x4500-02.unix marvell88sx: [ID 670675 kern.info] NOTICE: marvell88sx5: device on port 3 reset: device disconnected or device error Aug 11 08:47:59 x4500-02.unix sata: [ID 801593 kern.notice] NOTICE: /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]: Aug 11 08:47:59 x4500-02.unix port 3: device reset Aug 11 08:47:59 x4500-02.unix sata: [ID 801593 kern.notice] NOTICE: /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]: Aug 11 08:47:59 x4500-02.unix port 3: link lost Aug 11 08:47:59 x4500-02.unix sata: [ID 801593 kern.notice] NOTICE: /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]: Aug 11 08:47:59 x4500-02.unix port 3: link established Aug 11 08:47:59 x4500-02.unix marvell88sx: [ID 812950 kern.warning] WARNING: marvell88sx5: error on port 3: Aug 11 08:47:59 x4500-02.unix marvell88sx: [ID 517869 kern.info] device error Aug 11 08:47:59 x4500-02.unix marvell88sx: [ID 517869 kern.info] device disconnected Aug 11 08:47:59 x4500-02.unix marvell88sx: [ID 517869 kern.info] device connected Aug 11 08:47:59 x4500-02.unix marvell88sx: [ID 517869 kern.info] EDMA self disabled Aug 11 08:47:59 x4500-02.unix scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd30): Aug 11 08:47:59 x4500-02.unix Error for Command: read Error Level: Retryable Aug 11 08:47:59 x4500-02.unix scsi: [ID 107833 kern.notice] Requested Block: 439202Error Block: 439202 Aug 11 08:47:59 x4500-02.unix scsi: [ID 107833 kern.notice] Vendor: ATASerial Number: Aug 11 08:47:59 x4500-02.unix scsi: [ID 107833 kern.notice] Sense Key: No Additional Sense Aug 11 08:47:59 x4500-02.unix scsi: [ID 107833 kern.notice] ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0 scrub: resilver in progress, 10.27% done, 2h14m to go Perhaps not related, but equally annoying: # fmdump TIME UUID SUNW-MSG-ID Aug 11 08:16:32.3925 64da6f29-4dda-44aa-e9ca-ad7054aaeaa1 ZFS-8000-D3 Aug 11 09:08:18.7834 086e6170-e4c7-c66b-c908-e37840db7e96 ZFS-8000-D3 # fmdump -v -u 086e6170-e4c7-c66b-c908-e37840db7e96 TIME UUID SUNW-MSG-ID Aug 11 09:08:18.7834 086e6170-e4c7-c66b-c908-e37840db7e96 ZFS-8000-D3 ^C^Z^\ Alas, kill -9 does not kill fmdump either, and it appears to lock the server (as well). I will remove the command for now, as it definitely hangs the server every time. Hard reset done again. Lund -- Jorgen Lundman | [EMAIL PROTECTED] Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500 dead HDD, hung server, unable to boot.
the 'hd' utility on the tools and drivers cd produces the attached output on thumper. Clearly I need to find and install this utility, but even then, that seems to just add yet another way to number the drives. The message I get from kernel is: /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd30): And I need to get the answer 40. The hd output additionally gives me sdar ? Lund -- Jorgen Lundman | [EMAIL PROTECTED] Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500 dead HDD, hung server, unable to boot.
See http://www.sun.com/servers/x64/x4500/arch-wp.pdf page 21. Ian Referring to Page 20? That does show the drive order, just like it does on the box, but not how to map them from the kernel message to drive slot number. Lund -- Jorgen Lundman | [EMAIL PROTECTED] Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500 dead HDD, hung server, unable to boot.
Does the SATA controller show any information in its log (if you go into the controller BIOS, if there is one)? Seeing more reports of full systems hangs from an unresponsive drive makes me very concerned about bring a 4500 into our environment :( Not that I can see. Rebooting the new x4500 for the 6th time now as it keeps hanging on IO. (Box is 100% idle, but any IO commands like zpool/zfs/fmdump etc will just hung). I have absolutely no idea why it hangs now, we have pulled out the replacement drive to see if it stays up (in case it is a drive channel problem). The most disappointing aspects of all this, is the incredibly poor support we have had from our vendor (compared to NetApp support that we have had in the past). I would have thought being the biggest ISP in Japan would mean we'd be interesting to Sun, even if just a little bit. I suspect we are one the first to try x4500 here as well. Anyway, it has almost rebooted, so I need to go remount everything. Lund -- Jorgen Lundman | [EMAIL PROTECTED] Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500 dead HDD, hung server, unable to boot.
Jorgen Lundman wrote: Anyway, it has almost rebooted, so I need to go remount everything. Not that it wants to stay up for longer than ~20 mins, then hangs. In that all IO hangs, including nfsd. I thought this might have been related: http://sunsolve.sun.com/search/document.do?assetkey=1-66-233341-1 # /usr/X11/bin/scanpci | /usr/sfw/bin/ggrep -A1 vendor 0x11ab device 0x6081 pci bus 0x0001 cardnum 0x01 function 0x00: vendor 0x11ab device 0x6081 Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller But it claims resolved for our version: SunOS x4500-02.unix 5.10 Generic_127128-11 i86pc i386 i86pc Perhaps I should see if there are any recommended patches for Sol 10 5/08? Lund -- Jorgen Lundman | [EMAIL PROTECTED] Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500 dead HDD, hung server, unable to boot.
James C. McPherson wrote: One question to ask is: are you seeing the same messages on your system that are shown in that Sunsolve doc? Not just the write errors, but the whole sequence. Unfortunately, I get no messages at all. I/O just stops. But login shells are fine, as long as I don't issue commands that query zfs/zpool in any way. Nothing on console, dmesg, or the various log files. Just booted with -k since it happens so frequently. Most likely are not related to that bug. Having to do hard resets (well,from ILOM) doesn't feel good. Can you force a crash dump when the system hangs? If you can, then you could provide that to the support engineer who has accepted the call you've already logged with Sun's support organisation. You _did_ log a call, didn't you? Crash dump will be next time (30 mins or so), and we can only log a call with vendor, and if they feel like it, will push it to Sun. Although, we do have SunSolve logins, can we by-pass the middleman, and avoid the whole translation fiasco, and log directly with Sun? Lund -- Jorgen Lundman | [EMAIL PROTECTED] Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing the boot HDDs in x4500
Ross wrote: I do think a zfs import after booting from the new drives should work fine, and it doesn't automatically upgrade the pool, so you can still go back to snv_70b if needed. Alas, it would be downgrade. Which is why I think it will fail. PS. In your first post you said you had no time to copy the filesystem, so why are you trying to use send/receive? Both rsync and send/receive will take a long time to complete. zfs send of the /zvol/ufs volume would take 2 days. Currently it panics at least once a day. There appears to be no way to resume a half transfered zfs send. So, rsyncing smaller bits. zfs send -i only works if you have a full copy already, which we can't get from above. -- Jorgen Lundman | [EMAIL PROTECTED] Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing the boot HDDs in x4500
Ross wrote: Not if you don't upgrade the pool it won't. ZFS can import and work with an old version of the filesystem fine. The manual page for zpool upgrade says: Older versions can continue to be used Just import it on Solaris 5/08 without doing the upgrade. Your ZFS pool will be available and can be served out from the new version. If you do find any problems (which I wouldn't expect to be honest), you can plug your old snv_70b boot disk in if necessary. Now/old server is ZFS version 2 zfs. The new boot HDDs/OS, are only ZFS version 1. I do not think zfs version 1 will read version 2. I see no script talking about converting a version 2 to a version 1. -- Jorgen Lundman | [EMAIL PROTECTED] Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Replacing the boot HDDs in x4500
We have having some issues in copying the existing data on our Sol 11 snv_70b x4500 to the new Sol 10 5/08 x4500. With all the panics, and crashes, we have had no chance to completely copy a single filesystem. (ETA for that is about 48 hours). What are the chances that I can zpool import all filesystems if I were to simply drop in the two mirrored Sol 10 5/08 boot HDDs on the x4500 and reboot? I assume Sol10 5/08 zpool version would be newer, so in theory it would work. Comments? -- Jorgen Lundman | [EMAIL PROTECTED] Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500 performance tuning.
We have had a disk fail in the the existing x4500 and it sure froze the whole server. I believe it is an OS problem which (should have) been fixed in a version newer than we have. If you want me to test it on the new x4500 because it runs Sol10 508 I can do. Ross wrote: Hi Jorgen, This isn't an answer to your problem I'm afraid, but a request for you to do a test when you get your new x4500. Could you try pulling a SATA drive to see if the system hangs? I'm finding Solaris just locks up if I pull a drive connected to the Supermicro AOC-SAT2-MV8 card, and I was under the belief that uses the same chipset as the Thumper. I'm hoping this is just a driver problem, or a problem specific to the Supermicro card, but since our loan x4500 went back to Sun I'm unable to test this myself, and if the x4500's do lock up I'm a bit concerned about how they handle hardware failures. thanks, Ross This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Jorgen Lundman | [EMAIL PROTECTED] Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500 performance tuning.
Since we were drowning, we decided to go ahead and reboot with my guesses, even though I have not heard and expert opinions on the changes. (Also, 3 mins was way under estimated. It takes 12 minutes to reboot our x4500). The new values are: (original) set bufhwm_pct=10(2%) set maxusers=4096(2048) set ndquot=5048000 (50480) set ncsize=1038376 (129797) set ufs_ninode=1038376 (129797) It does appear to run more better, but it hard to tell. 7 out of 10 tries, statvfs64 takes less than 2seconds, but I did get as high as 14s. However, 2 hours later the x4500 hung. Pingable, but no console, nor NFS response. The LOM was fine, and I performed a remote reboot. Since then it has stayed up 5 hours. PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 521 daemon 7404K 6896K sleep 60 -20 0:25:03 3.1% nfsd/754 Total: 1 processes, 754 lwps, load averages: 0.82, 0.79, 0.79 CPU states: 90.6% idle, 0.0% user, 9.4% kernel, 0.0% iowait, 0.0% swap Memory: 16G real, 829M free, 275M swap in use, 16G swap free 10191915 total name lookups (cache hits 82%) maxsize 1038376 maxsize reached 993770 (Increased it by nearly x10 and it still gets a high 'reached'). Lund Jorgen Lundman wrote: We are having slow performance with the UFS volumes on the x4500. They are slow even on the local server. Which makes me think it is (for once) not NFS related. Current settings: SunOS x4500-01.unix 5.11 snv_70b i86pc i386 i86pc # cat /etc/release Solaris Express Developer Edition 9/07 snv_70b X86 Copyright 2007 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 30 August 2007 NFSD_SERVERS=1024 LOCKD_SERVERS=128 PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 12249 daemon 7204K 6748K sleep 60 -20 54:16:26 14% nfsd/731 load averages: 2.22, 2.32, 2.42 12:31:35 63 processes: 62 sleeping, 1 on cpu CPU states: 68.7% idle, 0.0% user, 31.3% kernel, 0.0% iowait, 0.0% swap Memory: 16G real, 1366M free, 118M swap in use, 16G swap free /etc/system: set ndquot=5048000 We have a setup like: /export/zfs1 /export/zfs2 /export/zfs3 /export/zfs4 /export/zfs5 /export/zdev/vol1/ufs1 /export/zdev/vol2/ufs2 /export/zdev/vol3/ufs3 What is interesting is that if I run df, it will display everything at normal speed, but pause before vol1/ufs1 file system. truss confirms that statvfs64() is slow (5 seconds usually). All other ZFS and UFS filesystems behave normally. vol1/ufs1 is the most heavily used UFS filesystem. Disk: /dev/zvol/dsk/zpool1/ufs1 991G 224G 758G23%/export/ufs1 Inodes: /dev/zvol/dsk/zpool1/ufs1 37698475 2504405360% /export/ufs1 Possible problems: # vmstat -s 866193018 total name lookups (cache hits 57%) # kstat -n inode_cache module: ufs instance: 0 name: inode_cache class:ufs maxsize 129797 maxsize reached 269060 thread idles319098740 vget idles 62136 This leads me to think we should consider setting; set ncsize=259594(doubled... are there better values?) set ufs_ninode=259594 in /etc/system, and reboot. But it is costly to reboot based only on my guess. Do you have any other suggestions to explore? Will this help? Sincerely, Jorgen Lundman -- Jorgen Lundman | [EMAIL PROTECTED] Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] x4500 performance tuning.
We are having slow performance with the UFS volumes on the x4500. They are slow even on the local server. Which makes me think it is (for once) not NFS related. Current settings: SunOS x4500-01.unix 5.11 snv_70b i86pc i386 i86pc # cat /etc/release Solaris Express Developer Edition 9/07 snv_70b X86 Copyright 2007 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 30 August 2007 NFSD_SERVERS=1024 LOCKD_SERVERS=128 PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 12249 daemon 7204K 6748K sleep 60 -20 54:16:26 14% nfsd/731 load averages: 2.22, 2.32, 2.42 12:31:35 63 processes: 62 sleeping, 1 on cpu CPU states: 68.7% idle, 0.0% user, 31.3% kernel, 0.0% iowait, 0.0% swap Memory: 16G real, 1366M free, 118M swap in use, 16G swap free /etc/system: set ndquot=5048000 We have a setup like: /export/zfs1 /export/zfs2 /export/zfs3 /export/zfs4 /export/zfs5 /export/zdev/vol1/ufs1 /export/zdev/vol2/ufs2 /export/zdev/vol3/ufs3 What is interesting is that if I run df, it will display everything at normal speed, but pause before vol1/ufs1 file system. truss confirms that statvfs64() is slow (5 seconds usually). All other ZFS and UFS filesystems behave normally. vol1/ufs1 is the most heavily used UFS filesystem. Disk: /dev/zvol/dsk/zpool1/ufs1 991G 224G 758G23%/export/ufs1 Inodes: /dev/zvol/dsk/zpool1/ufs1 37698475 2504405360% /export/ufs1 Possible problems: # vmstat -s 866193018 total name lookups (cache hits 57%) # kstat -n inode_cache module: ufs instance: 0 name: inode_cache class:ufs maxsize 129797 maxsize reached 269060 thread idles319098740 vget idles 62136 This leads me to think we should consider setting; set ncsize=259594(doubled... are there better values?) set ufs_ninode=259594 in /etc/system, and reboot. But it is costly to reboot based only on my guess. Do you have any other suggestions to explore? Will this help? Sincerely, Jorgen Lundman -- Jorgen Lundman | [EMAIL PROTECTED] Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500 performance tuning.
SunOS x4500-01.unix 5.11 snv_70b i86pc i386 i86pc That's a very old release, have you considered upgrading? Ian. It was the absolute latest version available when we received the x4500, and now it is live and supporting a large number of customers. However, the 2nd unit will arrive next week (Will be Sol10 508, as that is the only/newest OS version the vendor will support). So yes, in a way we will move to a newer version if we can work out a good way to migrate from one x4500 to another x4500:) But in the meanwhile, we were hoping we could do some kernel tweaking, reboot (3 minute downtime) and it would perform a little better. It would be nice to have someone who knows more than me, give their opinion as to if my guesses has any chances of succeeding. For example, Postfix delivering mail, system calls like open() and fdsync() is currently taking upwards of 7 seconds to complete. Lund -- Jorgen Lundman | [EMAIL PROTECTED] Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500 panic report.
Today we had another panic, at least it was during work time :) Just a shame the 999GB ufs takes 80+ mins to fsck. (Yes, it is mounted 'logging'). panic[cpu3]/thread=ff001e70dc80: free: freeing free block, dev:0xb60024, block:13144, ino:1737885, fs:/export /saba1 ff001e70d500 genunix:vcmn_err+28 () ff001e70d550 ufs:real_panic_v+f7 () ff001e70d5b0 ufs:ufs_fault_v+1d0 () ff001e70d6a0 ufs:ufs_fault+a0 () ff001e70d770 ufs:free+38f () ff001e70d830 ufs:indirtrunc+260 () ff001e70dab0 ufs:ufs_itrunc+738 () ff001e70db60 ufs:ufs_trans_itrunc+128 () ff001e70dbf0 ufs:ufs_delete+3b0 () ff001e70dc60 ufs:ufs_thread_delete+da () ff001e70dc70 unix:thread_start+8 () syncing file systems... panic[cpu3]/thread=ff001e70dc80: panic sync timeout dumping to /dev/dsk/c6t0d0s1, offset 65536, content: kernel $c vpanic() vcmn_err+0x28(3, f783a128, ff001e70d678) real_panic_v+0xf7(0, f783a128, ff001e70d678) ufs_fault_v+0x1d0(ff04facf65c0, f783a128, ff001e70d678) ufs_fault+0xa0() free+0x38f(ff001e70d8d0, a6a7358, 2000, 89) indirtrunc+0x260(ff001e70d8d0, a6a42b8, , 0, 89) ufs_itrunc+0x738(ff0550b9fde0, 0, 81, fffec0594db0) ufs_trans_itrunc+0x128(ff0550b9fde0, 0, 81, fffec0594db0) ufs_delete+0x3b0(fffed20e2a00, ff0550b9fde0, 1) ufs_thread_delete+0xda(64704840) thread_start+8() ::panicinfo cpu3 thread ff001e70dc80 message free: freeing free block, dev:0xb60024, block:13144, ino:1737885, fs:/export /saba1 rdi f783a128 rsi ff001e70d678 rdx f783a128 rcx ff001e70d678 r8 f783a128 r90 rax3 rbx0 rbp ff001e70d4d0 r10 fffec3d40580 r10 fffec3d40580 r11 ff001e70dc80 r12 f783a128 r13 ff001e70d678 r143 r15 f783a128 fsbase0 gsbase fffec3d40580 ds 4b es 4b fs0 gs 1c3 trapno0 err0 rip fb83c860 cs 30 rflags 246 rsp ff001e70d488 ss 38 gdt_hi0 gdt_lo 81ef idt_hi0 idt_lo 7fff ldt0 task 70 cr0 8005003b cr2 fed0e010 cr3 2c0 cr4 6f8 Jorgen Lundman wrote: On Saturday the X4500 system paniced, and rebooted. For some reason the /export/saba1 UFS partition was corrupt, and needed fsck. This is why it did not come back online. /export/saba1 is mounted logging,noatime, so fsck should never (-ish) be needed. SunOS x4500-01.unix 5.11 snv_70b i86pc i386 i86pc /export/saba1 on /dev/zvol/dsk/zpool1/saba1 read/write/setuid/devices/intr/largefiles/logging/quota/xattr/noatime/onerror=panic/dev=2d80024 on Sat Jul 5 08:48:54 2008 One possible related bug: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=4884138 What would be the best solution? Go back to latest Solaris 10 and pass it on to Sun support, or find a patch for this problem? Panic dump follows: -rw-r--r-- 1 root root 2529300 Jul 5 08:48 unix.2 -rw-r--r-- 1 root root 10133225472 Jul 5 09:10 vmcore.2 # mdb unix.2 vmcore.2 Loading modules: [ unix genunix specfs dtrace cpu.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs md ip hook neti sctp arp usba uhci s1394 qlc fctl nca lofs zfs random cpc crypto fcip fcp logindmux nsctl sdbc ptm sv ii sppp rdc nfs ] $c vpanic() vcmn_err+0x28(3, f783ade0, ff001e737aa8) real_panic_v+0xf7(0, f783ade0, ff001e737aa8) ufs_fault_v+0x1d0(fffed0bfb980, f783ade0, ff001e737aa8) ufs_fault+0xa0() dqput+0xce(1db26ef0) dqrele+0x48(1db26ef0) ufs_trans_dqrele+0x6f(1db26ef0) ufs_idle_free+0x16d(ff04f17b1e00) ufs_idle_some+0x152(3f60) ufs_thread_idle+0x1a1() thread_start+8() ::cpuinfo ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 0 fbc2fc10 1b00 60 nono t-0 ff001e737c80 sched 1 fffec3a0a000 1f10 -1 nono t-0ff001e971c80 (idle) 2 fffec3a02ac0 1f00 -1 nono t-1ff001e9dbc80 (idle) 3 fffec3d60580 1f00 -1
[zfs-discuss] x4500 panic report.
On Saturday the X4500 system paniced, and rebooted. For some reason the /export/saba1 UFS partition was corrupt, and needed fsck. This is why it did not come back online. /export/saba1 is mounted logging,noatime, so fsck should never (-ish) be needed. SunOS x4500-01.unix 5.11 snv_70b i86pc i386 i86pc /export/saba1 on /dev/zvol/dsk/zpool1/saba1 read/write/setuid/devices/intr/largefiles/logging/quota/xattr/noatime/onerror=panic/dev=2d80024 on Sat Jul 5 08:48:54 2008 One possible related bug: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=4884138 What would be the best solution? Go back to latest Solaris 10 and pass it on to Sun support, or find a patch for this problem? Panic dump follows: -rw-r--r-- 1 root root 2529300 Jul 5 08:48 unix.2 -rw-r--r-- 1 root root 10133225472 Jul 5 09:10 vmcore.2 # mdb unix.2 vmcore.2 Loading modules: [ unix genunix specfs dtrace cpu.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs md ip hook neti sctp arp usba uhci s1394 qlc fctl nca lofs zfs random cpc crypto fcip fcp logindmux nsctl sdbc ptm sv ii sppp rdc nfs ] $c vpanic() vcmn_err+0x28(3, f783ade0, ff001e737aa8) real_panic_v+0xf7(0, f783ade0, ff001e737aa8) ufs_fault_v+0x1d0(fffed0bfb980, f783ade0, ff001e737aa8) ufs_fault+0xa0() dqput+0xce(1db26ef0) dqrele+0x48(1db26ef0) ufs_trans_dqrele+0x6f(1db26ef0) ufs_idle_free+0x16d(ff04f17b1e00) ufs_idle_some+0x152(3f60) ufs_thread_idle+0x1a1() thread_start+8() ::cpuinfo ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 0 fbc2fc10 1b00 60 nono t-0 ff001e737c80 sched 1 fffec3a0a000 1f10 -1 nono t-0ff001e971c80 (idle) 2 fffec3a02ac0 1f00 -1 nono t-1ff001e9dbc80 (idle) 3 fffec3d60580 1f00 -1 nono t-1ff001ea50c80 (idle) ::panicinfo cpu0 thread ff001e737c80 message dqput: dqp-dq_cnt == 0 rdi f783ade0 rsi ff001e737aa8 rdx f783ade0 rcx ff001e737aa8 r8 f783ade0 r90 rax3 rbx0 rbp ff001e737900 r10 fbc26fb0 r10 fbc26fb0 r11 ff001e737c80 r12 f783ade0 r13 ff001e737aa8 r143 r15 f783ade0 fsbase0 gsbase fbc26fb0 ds 4b es 4b fsbase0 gsbase fbc26fb0 ds 4b es 4b fs0 gs 1c3 trapno0 err0 rip fb83c860 cs 30 rflags 246 rsp ff001e7378b8 ss 38 gdt_hi0 gdt_lo e1ef idt_hi0 idt_lo 77c00fff ldt0 task 70 cr0 8005003b cr2 fee7d650 cr3 2c0 cr4 6f8 ::msgbuf quota_ufs: over hard disk limit (pid 600, uid 178199, inum 941499, fs /export/zero1) quota_ufs: over hard disk limit (pid 600, uid 33647, inum 29504134, fs /export/zero1) panic[cpu0]/thread=ff001e737c80: dqput: dqp-dq_cnt == 0 ff001e737930 genunix:vcmn_err+28 () ff001e737980 ufs:real_panic_v+f7 () ff001e7379e0 ufs:ufs_fault_v+1d0 () ff001e737ad0 ufs:ufs_fault+a0 () ff001e737b00 ufs:dqput+ce () ff001e737b30 ufs:dqrele+48 () ff001e737b70 ufs:ufs_trans_dqrele+6f () ff001e737bc0 ufs:ufs_idle_free+16d () ff001e737c10 ufs:ufs_idle_some+152 () ff001e737c60 ufs:ufs_thread_idle+1a1 () ff001e737c70 unix:thread_start+8 () syncing file systems... -- Jorgen Lundman | [EMAIL PROTECTED] Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500 panic report.
Since the panic stack only ever goes through ufs, you should log a call with Sun support. We do have support, but they only speak Japanese, and I'm still quite poor at it. But I have started the process of having it translated and passed along to the next person. It is always fun to see what it becomes at the other end. Meanwhile, I like to research and see if it is a already known problem, rather than just sit around and wait. quota_ufs: over hard disk limit (pid 600, uid 33647, inum 29504134, fs /export/zero1) Although given the entry in the msgbuf, perhaps you might want to fix up your quota settings on that particular filesystem. Customers pay for a certain amount of disk-quota, and being users, always stay close to the edge. Those messages are as constant as precipitation in the rainy season. Are you suggestion that indicate a problem, beyond that the user is out of space? Lund -- Jorgen Lundman | [EMAIL PROTECTED] Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss