Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Wed, February 11, 2009 18:16, Uwe Dippel wrote: I need to disappoint you here, LED inactive for a few seconds is a very bad indicator of pending writes. Used to experience this on a stick on Ubuntu, which was silent until the 'umount' and then it started to write for some 10 seconds. Yikes, that's bizarre. On the other hand, you are spot-on w.r.t. 'umount'. Once the command is through, there is no more write to be expected. And if there was, it would be a serious bug. So this 'umount'ed system needs to be in perfectly consistent states. (Which is why I wrote further up that the structure above the file system, that is the pool, is probably the culprit for all this misery.) Yeah, once it's unmounted it really REALLY should be in a consistent state. [i]Conversely, anybody who is pulling disks / memory sticks off while IO is visibly incomplete really SHOULD expect to lose everything on them[/i] I hope you don't mean this. Not in a filesystem much hyped and much advanced. Of course, we expect corruption of all files whose 'write' has been boldly interrupted. But I for one, expect the metadata of all other files to be readily available. Kind of, at the next use, telling me:You idiot removed the plug last, while files were still in the process of writing. Don't expect them to be available now. Here is the list of all other files: [list of all files not being written then] It's good to have hopes, certainly. I'm just kinda cynical. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Write caches on X4540
We use several X4540's over here as well, what type of workload do you have, and how much performance increase did you see by disabling the write caches? We see the difference between our tests completing in around 2.5 minutes (with write caches) to around a minute an and a half without them, in one instance. I'm trying to optimize our machines for a write-heavy environment, as our users will undoubtedly hit this limitation of the machines. -Greg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
This sounds like exactly the kind of problem I've been shouting about for 6 months or more. I posted a huge thread on availability on these forums because I had concerns over exactly this kind of hanging. ZFS doesn't trust hardware or drivers when it comes to your data - everything is checksummed. However, when it comes to seeing whether devices are responding, and checking for faults, it blindly trusts whatever the hardware or driver tells it. Unfortunately, that means ZFS is vulnerable to any unexpected bug or error in the storage chain. I've encountered at least two hang conditions myself (and I'm not exactly a heavy user), and I've seen several others on the forums, including a few on x4500's. Now, I do accept that errors like this will be few and far between, but they still means you have the risk that a badly handled error condition can hang your entire server, instead of just one drive. Solaris can handle things like CPU's or Memory going faulty for crying out loud. Its raid storage system had better be able to handle a disk failing. Sun seem to be taking the approach that these errors should be dealt with in the driver layer. And while that's technically correct, a reliable storage system had damn well better be able to keep the server limping along while we wait for patches to the storage drivers. ZFS absolutely needs an error handling layer between the volume manager and the devices. It needs to timeout items that are not responding, and it needs to drop bad devices if they could cause problems elsewhere. And yes, I'm repeating myself, but I can't understand why this is not being acted on. Right now the error checking appears to be such that if an unexpected, or badly handled error condition occurs in the driver stack, the pool or server hangs. Whereas the expected behavior would be for just one drive to fail. The absolute worst case scenario should be that an entire controller has to be taken offline (and I would hope that the controllers in an x4500 would be running separate instances of the driver software). None one of those conditions should be fatal, good storage designs cope with them all, and good error handling at the ZFS layer is absolutely vital when you have projects like Comstar introducing more and more types of storage device for ZFS to work with. Each extra type of storage introduces yet more software into the equation, and increases the risk of finding faults like this. While they will be rare, they should be expected, and ZFS should be designed to handle them. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
after all statements read here I just want to highlight another issue regarding ZFS. It was here many times recommended to set copies=2. Installing Solaris 10 10/2008 or snv_107 you can choose either to use UFS or ZFS. If you choose ZFS by default, the rpool will be created by default with 'copies=1'. If someone does not mention this and you have a hanging system with no chance to access or to shutdown properly and you have no other chance than to press the power button of your notebook through the desk plate, couldn't it be that there happens the same with my external usb drive? This is the same sudden power off event what seems to damage my pool. And it would be a nice to have that ZFS could handle this. Another issue what I miss in this thread is, that ZFS is a layer on an EFI lable. What about that in case of a sudden power off event? Regards, Dave. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Write caches on X4540
Are you sure thar write cache is back on after restart? Yes, I've checked with format -e, on each drive. When disabling the write cache with format, it also gives a warning stating this is the case. What I'm looking for is a faster way to do this than format -e -d disk -f script, for all 48 disks. From format, after a reboot: selecting c10t7d0 [disk formatted] /dev/dsk/c10t7d0s0 is part of active ZFS pool export. Please see zpool(1M). FORMAT MENU: disk - select a disk type - select (define) a disk type partition - select (define) a partition table current- describe the current disk format - format and analyze the disk fdisk - run the fdisk program repair - repair a defective sector label - write label to the disk analyze- surface analysis defect - defect list management backup - search for backup labels verify - read and display labels inquiry- show vendor, product and revision scsi - independent SCSI mode selects cache - enable, disable or query SCSI disk cache volname- set 8-character volume name !cmd - execute cmd, then return quit format cache CACHE MENU: write_cache - display or modify write cache settings read_cache - display or modify read cache settings !cmd - execute cmd, then return quit cache write_cache WRITE_CACHE MENU: display - display current setting of write cache enable - enable write cache disable - disable write cache !cmd - execute cmd, then return quit write_cache display Write Cache is enabled write_cache disable This setting is valid until next reset only. It is not saved permanently. write_cache display Write Cache is disabled ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
All that and yet the fact remains: I#39;ve never quot;ejectedquot; a USB drive from OS X or Windows, I simply pull it and go, and I#39;ve never once lost data, or had it become unrecoverable or even corrupted.br brAnd yes, I do keep checksums of all the data sitting on them and periodically check it. nbsp;So, for all of your ranting and raving, the fact remains even a *crappy* filesystem like fat32 manages to handle a hot unplug without any prior notice without going belly up.br br--Timbr/div/div Just wanted to chime in with my 2c here. I've also *never* unmounted a USB drive from windows, and have been using them regularly since memory sticks became available. So that's 2-3 years of experience and I've never lost work on a memory stick, nor had a file corrupted. I can also state with confidence that very, very few of the 100 staff working here will even be aware that it's possible to unmount a USB volume in windows. They will all just pull the plug when their work is saved, and since they all come to me when they have problems, I think I can safely say that pulling USB devices really doesn't tend to corrupt filesystems in Windows. Everybody I know just waits for the light on the device to go out. And while this isn't really what ZFS is designed to do, I do think it should be able to cope. First of all, some kind of ZFS recovery tools are needed. There's going to be an awful lot of good data on that disk, making all of that inaccessible just because the last write failed isn't really on. It's a copy on write filesystem, zpool import really should be able to take advantage of that for recovering pools! I don't know the technicalities of how it works on disk, but my feeling is that the last successful mount point should be saved, and the last few uberblocks should also be available, so barring complete hardware failure, some kind of pool should be available for mounting. Also, if a drive is removed while writes are pending, some kind of error or warning is needed, either in the console, or the GUI. It should be possible to prompt the user to re-insert the device so that the remaining writes can be completed. Recovering the pool in that situation should be easy - you can keep the location of the uberblock you're using in memory, and just re-write everything. Of course, that does assume that devices are being truthful when they say that data has been committed, but a little data loss from badly designed hardware is I feel acceptable, so long as ZFS can have a go at recovering corrupted pools when it does happen, instead of giving up completely like it does now. Yes, these problems happen more often with consumer level hardware, but recovery tools like this are going to be very much appreciated by anybody who encounters problems like this on a server! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
On Thu, Feb 12, 2009 at 9:25 AM, Ross myxi...@googlemail.com wrote: This sounds like exactly the kind of problem I've been shouting about for 6 months or more. I posted a huge thread on availability on these forums because I had concerns over exactly this kind of hanging. ZFS doesn't trust hardware or drivers when it comes to your data - everything is checksummed. However, when it comes to seeing whether devices are responding, and checking for faults, it blindly trusts whatever the hardware or driver tells it. Unfortunately, that means ZFS is vulnerable to any unexpected bug or error in the storage chain. I've encountered at least two hang conditions myself (and I'm not exactly a heavy user), and I've seen several others on the forums, including a few on x4500's. Now, I do accept that errors like this will be few and far between, but they still means you have the risk that a badly handled error condition can hang your entire server, instead of just one drive. Solaris can handle things like CPU's or Memory going faulty for crying out loud. Its raid storage system had better be able to handle a disk failing. Sun seem to be taking the approach that these errors should be dealt with in the driver layer. And while that's technically correct, a reliable storage system had damn well better be able to keep the server limping along while we wait for patches to the storage drivers. ZFS absolutely needs an error handling layer between the volume manager and the devices. It needs to timeout items that are not responding, and it needs to drop bad devices if they could cause problems elsewhere. And yes, I'm repeating myself, but I can't understand why this is not being acted on. Right now the error checking appears to be such that if an unexpected, or badly handled error condition occurs in the driver stack, the pool or server hangs. Whereas the expected behavior would be for just one drive to fail. The absolute worst case scenario should be that an entire controller has to be taken offline (and I would hope that the controllers in an x4500 would be running separate instances of the driver software). None one of those conditions should be fatal, good storage designs cope with them all, and good error handling at the ZFS layer is absolutely vital when you have projects like Comstar introducing more and more types of storage device for ZFS to work with. Each extra type of storage introduces yet more software into the equation, and increases the risk of finding faults like this. While they will be rare, they should be expected, and ZFS should be designed to handle them. I'd imagine for the exact same reason short-stroking/right-sizing isn't a concern. We don't have this problem in the 7000 series, perhaps you should buy one of those. ;) --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Hello Bob, Wednesday, February 11, 2009, 11:25:12 PM, you wrote: BF I agree. ZFS apparently syncs uncommitted writes every 5 seconds. BF If there has been no filesystem I/O (including read I/O due to atime) BF for at least 10 seconds, and there has not been more data BF burst-written into RAM than can be written to disk in 10 seconds, then BF there should be nothing remaining to write. That's not entirely true. After recent changes writes could be delayed even up-to 30s by default. -- Best regards, Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Write caches on X4540
On Thu, Feb 12, 2009 at 10:33:40AM -0500, Greg Mason wrote: What I'm looking for is a faster way to do this than format -e -d disk -f script, for all 48 disks. Is the speed critical? I mean, do you have to pause startup while the script runs, or does it interfere with data transfer? -- Darren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Ross wrote: I can also state with confidence that very, very few of the 100 staff working here will even be aware that it's possible to unmount a USB volume in windows. They will all just pull the plug when their work is saved, and since they all come to me when they have problems, I think I can safely say that pulling USB devices really doesn't tend to corrupt filesystems in Windows. Everybody I know just waits for the light on the device to go out. The key here is that Windows does not cache writes to the USB drive unless you go in and specifically enable them. It caches reads but not writes. If you enable them you will lose data if you pull the stick out before all the data is written. This is the type of safety measure that needs to be implemented in ZFS if it is to support the average user instead of just the IT professionals. Regards, Greg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS panic in build 108?
I upgraded my 280R system to yesterday's nightly build, and when I rebooted, this happened: Boot device: /p...@8,60/SUNW,q...@4/f...@0,0/d...@w212037e9abe4,0:a File and args: SunOS Release 5.11 Version snv_108 64-bit Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. / panic[cpu0]/thread=300045b6780: BAD TRAP: type=31 rp=2a100734df0 addr=28 mmu_fsr=0 occurred in module zfs due to a NULL pointer dereference zfs: trap type = 0x31 addr=0x28 pid=55, pc=0x12f7e68, sp=0x2a100734691, tstate=0x4480001600, context=0x57 g1-g7: 2c19c6, 0, 0, 0, 0, 57, 300045b6780 02a100734b00 unix:die+74 (10bd400, 2a100734df0, 28, 0, 8, 2a100734bc0) %l0-3: 012f7e68 0010 0010 0100 %l4-7: 2000 010bd770 010bd400 000b 02a100734be0 unix:trap+9e8 (2a100734df0, 0, 31, 0, 0, 1c00) %l0-3: 02a100734ce0 0005 03000428ac28 %l4-7: 0001 0001 0028 02a100734d40 unix:ktl0+48 (0, 18ebfb8, 3514, 180c000, 3255d18, 2a100734f38) %l0-3: 0001 1400 004480001600 0101b128 %l4-7: 03317840 018ec000 02a100734df0 02a100734e90 zfs:vdev_readable+4 (0, 1, fffd, 0, 0, 3) %l0-3: 013324a0 0006 0003 0050bd73 %l4-7: 0007 0300030be080 0001 fff8 02a100734f40 zfs:vdev_mirror_child_select+74 (300070ef120, 0, 300070ef110, 300070ef110, 6, 97) %l0-3: 0300070ef120 0050bd73 0001 %l4-7: 002c19c6 0003 7fff 0006 02a100734ff0 zfs:vdev_mirror_io_start+d8 (300027756e0, 1, 12fb53c, 300070ef110, 8, 1) %l0-3: fc00 0001 0001 4000 %l4-7: 0300045b6780 032f0008 032f0058 0001 02a1007350d0 zfs:zio_execute+8c (300027756e0, 130c45c, 58, b, 18e2c00, 400) %l0-3: 018e2f60 0001 fc04 0001 %l4-7: 0800 0800 0001 02a100735180 zfs:zio_wait+c (300027756e0, 1, 1, 30002775998, 0, 12c7ebc) %l0-3: 00074458 018e65a8 0753 030002632000 %l4-7: 018e4000 0001 fc04 fc00 02a100735230 zfs:arc_read_nolock+83c (0, 32f, 2, 12c7c00, 0, 400) %l0-3: 0001 0001 02a100735408 018e4000 %l4-7: 030006ce1bc0 02a10073542c 018e8450 030007677998 02a100735340 zfs:dmu_objset_open_impl+b0 (32f, 0, 30006bcd970, 2a1007354f8, 18e2c00, 3000664a540) %l0-3: 02a100735408 0001 012c7c00 03000664a550 %l4-7: 0300027756e0 0300030be080 030006bcd940 03000664a570 02a100735430 zfs:dsl_pool_open+30 (32f, 50bd74, 32f01a8, , 0, 30006bcdbc8) %l0-3: 0001 0132d400 030006bcd940 0300054acd40 %l4-7: 0300027756e0 0300030be080 0001 0001 02a100735500 zfs:spa_load+960 (32f, 32f0320, 1, 32f03c0, bab10c, 1815600) %l0-3: 013324a0 0006 0003 0050bd73 %l4-7: 0007 0300030be080 0001 fff8 02a1007355f0 zfs:spa_open_common+80 (3000617e000, 2a100735758, 132d4b5, 2a100735818, 0, 18f9000) %l0-3: 0005 018f90f0 032f 0003 %l4-7: 018f9000 0180c0e8 0180c0e8 02a1007356a0 zfs:spa_get_stats+18 (3000617e000, 2a100735818, 3000617e400, 800, 7a, 132d400) %l0-3: 01839ea0 0183fe30 0180c000 %l4-7: 0001 02a100735760 zfs:zfs_ioc_pool_stats+10 (3000617e000, 0, 0, 7a, 3000617e000, 3000428ac28) %l0-3: 03000617e002 03000617e000 0019 %l4-7: 005a 007a 01336400 01336400 02a100735820 zfs:zfsdev_ioctl+124 (18e39f0, 10005, ffbfecd8, 1000, 32eb178, 3000617e000) %l0-3: 018e3a00 0078 000f 0005 %l4-7: 0014 018e3800 0001 013185d8 02a1007358d0 genunix:fop_ioctl+58 (300049fcc80, 5a05, ffbfecd8, 13, 0, 2a100735adc) %l0-3: 030004a63800 012b6d18 030001a60c00 018c5930 %l4-7: 01877c00 0300030bfbf8 0001 018c6800 02a100735990 genunix:ioctl+164 (3, 5a05, ffbfecd8, 7a7000, 30003215928, 800) %l0-3: 0013 0003 %l4-7: 0003
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Thu, February 12, 2009 10:10, Ross wrote: Of course, that does assume that devices are being truthful when they say that data has been committed, but a little data loss from badly designed hardware is I feel acceptable, so long as ZFS can have a go at recovering corrupted pools when it does happen, instead of giving up completely like it does now. Well; not acceptable as such. But I'd agree it's outside ZFS's purview. The blame for data lost due to hardware actively lying and not working to spec goes to the hardware vendor, not to ZFS. If ZFS could easily and reliably warn about such hardware I'd want it to, but the consensus seems to be that we don't have a reliable qualification procedure. In terms of upselling people to a Sun storage solution, having ZFS diagnose problems with their cheap hardware early is clearly desirable :-). -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recover data in /root
On 02/11/09 12:14, Jonny Gerold wrote: I have a non bootable disk and need to recover files from /root... When I import the disk via zpool import /root isnt mounted... Thanks, Jonny ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss I need more information. Is this a root pool? (you say it's non-bootable, but that does mean it's not a root pool, or it's a root pool that is failing to boot?). Is the name of the pool root? Are there error messages from the import? What does zfs get all dataset-name report after the import? What's the output of zfs list? Lori ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
Heh, yeah, I've thought the same kind of thing in the past. The problem is that the argument doesn't really work for system admins. As far as I'm concerned, the 7000 series is a new hardware platform, with relatively untested drivers, running a software solution that I know is prone to locking up when hardware faults are handled badly by drivers. Fair enough, that actual solution is out of our price range, but I would still be very dubious about purchasing it. At the very least I'd be waiting a year for other people to work the kinks out of the drivers. Which is a shame, because ZFS has so many other great features it's easily our first choice for a storage platform. The one and only concern we have is its reliability. We have snv_106 running as a test platform now. If I felt I could trust ZFS 100% I'd roll it out tomorrow. On Thu, Feb 12, 2009 at 4:25 PM, Tim t...@tcsac.net wrote: On Thu, Feb 12, 2009 at 9:25 AM, Ross myxi...@googlemail.com wrote: This sounds like exactly the kind of problem I've been shouting about for 6 months or more. I posted a huge thread on availability on these forums because I had concerns over exactly this kind of hanging. ZFS doesn't trust hardware or drivers when it comes to your data - everything is checksummed. However, when it comes to seeing whether devices are responding, and checking for faults, it blindly trusts whatever the hardware or driver tells it. Unfortunately, that means ZFS is vulnerable to any unexpected bug or error in the storage chain. I've encountered at least two hang conditions myself (and I'm not exactly a heavy user), and I've seen several others on the forums, including a few on x4500's. Now, I do accept that errors like this will be few and far between, but they still means you have the risk that a badly handled error condition can hang your entire server, instead of just one drive. Solaris can handle things like CPU's or Memory going faulty for crying out loud. Its raid storage system had better be able to handle a disk failing. Sun seem to be taking the approach that these errors should be dealt with in the driver layer. And while that's technically correct, a reliable storage system had damn well better be able to keep the server limping along while we wait for patches to the storage drivers. ZFS absolutely needs an error handling layer between the volume manager and the devices. It needs to timeout items that are not responding, and it needs to drop bad devices if they could cause problems elsewhere. And yes, I'm repeating myself, but I can't understand why this is not being acted on. Right now the error checking appears to be such that if an unexpected, or badly handled error condition occurs in the driver stack, the pool or server hangs. Whereas the expected behavior would be for just one drive to fail. The absolute worst case scenario should be that an entire controller has to be taken offline (and I would hope that the controllers in an x4500 would be running separate instances of the driver software). None one of those conditions should be fatal, good storage designs cope with them all, and good error handling at the ZFS layer is absolutely vital when you have projects like Comstar introducing more and more types of storage device for ZFS to work with. Each extra type of storage introduces yet more software into the equation, and increases the risk of finding faults like this. While they will be rare, they should be expected, and ZFS should be designed to handle them. I'd imagine for the exact same reason short-stroking/right-sizing isn't a concern. We don't have this problem in the 7000 series, perhaps you should buy one of those. ;) --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-USAS-L8i
On Thu, Feb 12, 2009 at 19:02, Brandon High bh...@freaks.com wrote: There's a post there from a guy using two of the AOC-USAS-L8i in his system here: http://hardforum.com/showthread.php?p=1033321345 Read again---he's using the AOC-SAT2-MV8, which is PCI-X. That is known to work fine, even in PCI slots. He does ask in his post #7 if the AOC-USAS-L8i card works, but no response yet. Unfortunately, Hardforums have been having database problems (again, sigh) and have lost all posts made between the 5th and 11th of February. Thus, I can't link to the post I saw about the card not working, and I don't see it in Google cache or archive.org. Perhaps I'll ask again and see if I can find the answer. As far as I know, ULI is the same as PCI-e, just backwards. I think you mean UIO. And compatible though it may be, it still leaves open the question of how to mount the card securely. Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS snapshot splitting joining
The problem was with the shell. For whatever reason, /usr/bin/ksh can't rejoin the files correctly. When I switched to /sbin/sh, the rejoin worked fine, the cksum's matched, ... The ksh I was using is: # what /usr/bin/ksh /usr/bin/ksh: Version M-11/16/88i SunOS 5.10 Generic 118873-04 Aug 2006 So, is this a bug in the ksh included with Solaris 10? Are you able to reproduce the issue with a script like this (needs ~ 200 gigabytes of free disk space) ? I can't... == % cat split.sh #!/bin/ksh bs=1k count=`expr 57 \* 1024 \* 1024` split_bs=8100m set -x dd if=/dev/urandom of=data.orig bs=${bs} count=${count} split -b ${split_bs} data.orig data.split. ls -l data.split.* cat data.split.a[a-z] data.join cmp -l data.orig data.join == On SX:CE / OpenSolaris the same version of /bin/ksh = /usr/bin/ksh is present: % what /usr/bin/ksh /usr/bin/ksh: Version M-11/16/88i SunOS 5.11 snv_104 November 2008 I did run the script in a directory in an uncompressed zfs filesystem: % ./split.sh + dd if=/dev/urandom of=data.orig bs=1k count=59768832 59768832+0 records in 59768832+0 records out + split -b 8100m data.orig data.split. + ls -l data.split.aa data.split.ab data.split.ac data.split.ad data.split.ae data.split.af data.split.ag data.split.ah -rw-r--r-- 1 jk usr 8493465600 Feb 12 18:31 data.split.aa -rw-r--r-- 1 jk usr 8493465600 Feb 12 18:35 data.split.ab -rw-r--r-- 1 jk usr 8493465600 Feb 12 18:39 data.split.ac -rw-r--r-- 1 jk usr 8493465600 Feb 12 18:43 data.split.ad -rw-r--r-- 1 jk usr 8493465600 Feb 12 18:48 data.split.ae -rw-r--r-- 1 jk usr 8493465600 Feb 12 18:53 data.split.af -rw-r--r-- 1 jk usr 8493465600 Feb 12 18:58 data.split.ag -rw-r--r-- 1 jk usr 1749024768 Feb 12 18:58 data.split.ah + cat data.split.aa data.split.ab data.split.ac data.split.ad data.split.ae data.split.af data.split.ag data.split.ah + 1 data.join + cmp -l data.orig data.join 2002.33u 2302.05s 1:51:06.85 64.5% As expected, it works without problem. The files are bit for bit identical after splitting and joining. For me this looks more as if your hardware is broken: http://opensolaris.org/jive/thread.jspa?messageID=338148 A single bad bit (!) in the middle of the joined file is very suspicious... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Thu, Feb 12, 2009 at 11:31 AM, David Dyer-Bennet d...@dd-b.net wrote: On Thu, February 12, 2009 10:10, Ross wrote: Of course, that does assume that devices are being truthful when they say that data has been committed, but a little data loss from badly designed hardware is I feel acceptable, so long as ZFS can have a go at recovering corrupted pools when it does happen, instead of giving up completely like it does now. Well; not acceptable as such. But I'd agree it's outside ZFS's purview. The blame for data lost due to hardware actively lying and not working to spec goes to the hardware vendor, not to ZFS. If ZFS could easily and reliably warn about such hardware I'd want it to, but the consensus seems to be that we don't have a reliable qualification procedure. In terms of upselling people to a Sun storage solution, having ZFS diagnose problems with their cheap hardware early is clearly desirable :-). Right, well I can't imagine it's impossible to write a small app that can test whether or not drives are honoring correctly by issuing a commit and immediately reading back to see if it was indeed committed or not. Like a zfs test cXtX. Of course, then you can't just blame the hardware everytime something in zfs breaks ;) --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-USAS-L8i
On Thu, Feb 12, 2009 at 1:22 PM, Will Murnane will.murn...@gmail.comwrote: On Thu, Feb 12, 2009 at 19:02, Brandon High bh...@freaks.com wrote: There's a post there from a guy using two of the AOC-USAS-L8i in his system here: http://hardforum.com/showthread.php?p=1033321345 Read again---he's using the AOC-SAT2-MV8, which is PCI-X. That is known to work fine, even in PCI slots. He does ask in his post #7 if the AOC-USAS-L8i card works, but no response yet. Unfortunately, Hardforums have been having database problems (again, sigh) and have lost all posts made between the 5th and 11th of February. Thus, I can't link to the post I saw about the card not working, and I don't see it in Google cache or archive.org. Perhaps I'll ask again and see if I can find the answer. As far as I know, ULI is the same as PCI-e, just backwards. I think you mean UIO. And compatible though it may be, it still leaves open the question of how to mount the card securely. Will Are you selectively ignoring responses to this thread or something? Dave has already stated he *HAS IT WORKING TODAY*. Dave wrote: * Yes. I have an AOC-USAS-L8i working in a regular PCI-E slot in my Tyan 2927 motherboard.* The AOC-SAT2-MV8 also works in a regular PCI slot (although it is PCI-X card). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Write caches on X4540
well, since the write cache flush command is disabled, I would like this to happen as early as practically possible in the bootup process, as ZFS will not be issuing the cache flush commands to the disks. I'm not really sure what happens in the case where the write flush command is disabled, something makes its way into the write cache, then the cache is disabled. Does this mean the write cache is flushed to disk when the cache is disabled? If so, then I guess it's less critical when it happens in the bootup process or if it's permanent... -Greg A Darren Dunham wrote: On Thu, Feb 12, 2009 at 10:33:40AM -0500, Greg Mason wrote: What I'm looking for is a faster way to do this than format -e -d disk -f script, for all 48 disks. Is the speed critical? I mean, do you have to pause startup while the script runs, or does it interfere with data transfer? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solaris and zfs versions
Mark, I believe creating a older version pool is supported: zpool create -o version=vers whirl c0t0d0 I'm not sure what version of ZFS in Solaris 10 you are running. Try running zpool upgrade and replacing vers above with that version number. Neil. : trasimene ; zpool create -o version=11 whirl c0t0d0 : trasimene ; zpool get version whirl NAME PROPERTY VALUESOURCE whirl version 11 local : trasimene ; zpool upgrade This system is currently running ZFS pool version 14. The following pools are out of date, and can be upgraded. After being upgraded, these pools will no longer be accessible by older software versions. VER POOL --- 11 whirl Use 'zpool upgrade -v' for a list of available versions and their associated features. : trasimene ; On 02/12/09 11:42, Mark Winder wrote: We’ve been experimenting with zfs on OpenSolaris 2008.11. We created a pool in OpenSolaris and filled it with data. Then we wanted to move it to a production Solaris 10 machine (generic_137138_09) so I “zpool exported” in OpenSolaris, moved the storage, and “zpool imported” in Solaris 10. We got: Cannot import ‘deadpool’: pool is formatted using a newer ZFS version We would like to be able to move pools back and forth between the OS’s. Is there a way we can upgrade Solaris 10 to the same supported zfs version (or create downgraded pools in OpenSolaris)? Thanks! Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solaris and zfs versions
Mark Winder wrote: We’ve been experimenting with zfs on OpenSolaris 2008.11. We created a pool in OpenSolaris and filled it with data. Then we wanted to move it to a production Solaris 10 machine (generic_137138_09) so I “zpool exported” in OpenSolaris, moved the storage, and “zpool imported” in Solaris 10. We got: Cannot import ‘deadpool’: pool is formatted using a newer ZFS version We would like to be able to move pools back and forth between the OS’s. Is there a way we can upgrade Solaris 10 to the same supported zfs version (or create downgraded pools in OpenSolaris)? You have to create the pool with the version supported on the oldest system. See the version property on the zpool man page. Solaris 10 Update 6 uses version 10. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Thu, Feb 12, 2009 at 11:53:40AM -0500, Greg Palmer wrote: Ross wrote: I can also state with confidence that very, very few of the 100 staff working here will even be aware that it's possible to unmount a USB volume in windows. They will all just pull the plug when their work is saved, and since they all come to me when they have problems, I think I can safely say that pulling USB devices really doesn't tend to corrupt filesystems in Windows. Everybody I know just waits for the light on the device to go out. The key here is that Windows does not cache writes to the USB drive unless you go in and specifically enable them. It caches reads but not writes. If you enable them you will lose data if you pull the stick out before all the data is written. This is the type of safety measure that needs to be implemented in ZFS if it is to support the average user instead of just the IT professionals. That implies that ZFS will have to detect removable devices and treat them differently than fixed devices. It might have to be an option that can be enabled for higher performance with reduced data security. -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Right, well I can't imagine it's impossible to write a small app that can test whether or not drives are honoring correctly by issuing a commit and immediately reading back to see if it was indeed committed or not. Like a zfs test cXtX. Of course, then you can't just blame the hardware everytime something in zfs breaks ;) A read of data in the disk cache will be read from the disk cache. You can't tell the disk to ignore its cache and read directly from the plater. The only way to test this is to write and the remove the power from the disk. Not easy in software. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
That would be the ideal, but really I'd settle for just improved error handling and recovery for now. In the longer term, disabling write caching by default for USB or Firewire drives might be nice. On Thu, Feb 12, 2009 at 8:35 PM, Gary Mills mi...@cc.umanitoba.ca wrote: On Thu, Feb 12, 2009 at 11:53:40AM -0500, Greg Palmer wrote: Ross wrote: I can also state with confidence that very, very few of the 100 staff working here will even be aware that it's possible to unmount a USB volume in windows. They will all just pull the plug when their work is saved, and since they all come to me when they have problems, I think I can safely say that pulling USB devices really doesn't tend to corrupt filesystems in Windows. Everybody I know just waits for the light on the device to go out. The key here is that Windows does not cache writes to the USB drive unless you go in and specifically enable them. It caches reads but not writes. If you enable them you will lose data if you pull the stick out before all the data is written. This is the type of safety measure that needs to be implemented in ZFS if it is to support the average user instead of just the IT professionals. That implies that ZFS will have to detect removable devices and treat them differently than fixed devices. It might have to be an option that can be enabled for higher performance with reduced data security. -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Is this the crux of the problem? http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6424510 'For usb devices, the driver currently ignores DKIOCFLUSHWRITECACHE. This can cause catastrophic data corruption in the event of power loss, even for filesystems like ZFS that are designed to survive it. Dropping a flush-cache command is just as bad as dropping a write. It violates the interface that software relies on to use the device.' -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] SPAM *** importing unformatted partition
Hello, I need advice how to import unformatted partition. I split my 150GB disk into 3 partitions: 1. 50GB windows 2. 50GB Opensolaris 3. 50GB unformatted I would like to import 3. partition as a another pool but I can't see this partition. sh-3.2# format -e Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c7t0d0 drive type unknown /p...@0,0/pci104d,8...@1d,7/stor...@4/d...@0,0 1. c9d0 DEFAULT cyl 7830 alt 2 hd 255 sec 63 /p...@0,0/pci-...@1f,2/i...@0/c...@0,0 Specify disk (enter its number): ^[[^C I guess that 0. is wind partition and 1. is Opensolaris sh-3.2# zpool list NAMESIZE USED AVAILCAP HEALTH ALTROOT rpool 59.5G 3.82G 55.7G 6% ONLINE - sh-3.2# zpool import sh-3.2# How can I find and import left partition? Thanks for help. Regards, Jan Hlodan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-USAS-L8i
On Thu, Feb 12, 2009 at 20:05, Tim t...@tcsac.net wrote: Are you selectively ignoring responses to this thread or something? Dave has already stated he *HAS IT WORKING TODAY*. No, I saw that post. However, I saw one unequivocal it doesn't work earlier (even if I can't show it to you), which implies to me that whether the card works or not in a particular setup is somewhat finicky. So here's one datapoint: Dave wrote: Yes. I have an AOC-USAS-L8i working in a regular PCI-E slot in my Tyan 2927 motherboard. but the thread that Brandon linked to does not contain a datapoint. For what it's worth, I think these are the only two datapoints I've seen; most threads about this card end up debating back and forth whether it will work, with nobody actually buying and testing the card. Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SPAM *** importing unformatted partition
On Thu, Feb 12, 2009 at 21:59, Jan Hlodan jh231...@mail-emea.sun.com wrote: I would like to import 3. partition as a another pool but I can't see this partition. sh-3.2# format -e Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c7t0d0 drive type unknown /p...@0,0/pci104d,8...@1d,7/stor...@4/d...@0,0 1. c9d0 DEFAULT cyl 7830 alt 2 hd 255 sec 63 /p...@0,0/pci-...@1f,2/i...@0/c...@0,0 I guess that 0. is wind partition and 1. is Opensolaris What you see there are whole disks, not partitions. Try zpool status, which will show you that rpool is on something like c9d0s0. Then go into format again, pick 1 (in my example), type fdisk to look at the DOS-style partition table and verify that the partitioning of the disk matches what you thought it was. Then you can create a new zpool with something like zpool create data c9t0p3. Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-USAS-L8i
Will Murnane wrote: On Thu, Feb 12, 2009 at 20:05, Tim t...@tcsac.net wrote: Are you selectively ignoring responses to this thread or something? Dave has already stated he *HAS IT WORKING TODAY*. No, I saw that post. However, I saw one unequivocal it doesn't work earlier (even if I can't show it to you), which implies to me that whether the card works or not in a particular setup is somewhat finicky. So here's one datapoint: Dave wrote: Yes. I have an AOC-USAS-L8i working in a regular PCI-E slot in my Tyan 2927 motherboard. but the thread that Brandon linked to does not contain a datapoint. For what it's worth, I think these are the only two datapoints I've seen; most threads about this card end up debating back and forth whether it will work, with nobody actually buying and testing the card. I can tell you that the USAS-L8i absolutely works fine with a Tyan 2927 in a Chenbro RM31616 3U rackmount chassis. In fact, I have two of the USAS-L8i in this chassis because I forgot that, unlike the 8-port AOC-SAT2-MV8, the USAS-L8i can support up to 122 drives. I have 8 drives connected to the first USAS-L8i. They are set up in a raidz-2 and I get 90-120MB/sec read and 60-75MB/sec write during my rsyncs from linux machines (this solaris box is only used to store backup data). I plan on removing the second USAS-L8i and connect all 16 drives to the first USAS-L8i when I need more storage capacity. I have no doubt that it will work as intended. I will report to the list otherwise. -- Dave ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-USAS-L8i
For what it's worth, I know that at least one person is using a LSI SAS3081E card which I believe is based on exactly the same chipset: http://www.opensolaris.org/jive/message.jspa?messageID=186415 -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
That does look like the issue being discussed. It's a little alarming that the bug was reported against snv54 and is still not fixed :( Does anyone know how to push for resolution on this? USB is pretty common, like it or not for storage purposes - especially amongst the laptop-using dev crowd that OpenSolaris apparently targets. On Thu, Feb 12, 2009 at 4:44 PM, bdebel...@intelesyscorp.com bdebel...@intelesyscorp.com wrote: Is this the crux of the problem? http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6424510 'For usb devices, the driver currently ignores DKIOCFLUSHWRITECACHE. This can cause catastrophic data corruption in the event of power loss, even for filesystems like ZFS that are designed to survive it. Dropping a flush-cache command is just as bad as dropping a write. It violates the interface that software relies on to use the device.' -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Thu, February 12, 2009 14:02, Tim wrote: Right, well I can't imagine it's impossible to write a small app that can test whether or not drives are honoring correctly by issuing a commit and immediately reading back to see if it was indeed committed or not. Like a zfs test cXtX. Of course, then you can't just blame the hardware everytime something in zfs breaks ;) I can imagine it fairly easily. All you've got to work with is what the drive says about itself, and how fast, and the what we're trying to test is whether it lies. It's often very hard to catch it out on this sort of thing. We need somebody who really understands the command sets available to send to modern drives (which is not me) to provide a test they think would work, and people can argue or try it. My impression, though, is that the people with the expertise are so far consistently saying it's not possible. I think at this point somebody who thinks it's possible needs to do the work to at least propose a specific test, or else we have to give up on the idea. I'm still hoping for at least some kind of qualification procedure involving manual intervention (hence not something that could be embodied in a simple command you just typed), but we're not seeing even this so far. Of course, the other side of this is that, if people know that drives have these problems, there must in fact be some way to demonstrate it, or they wouldn't know. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] strange 'too many errors' msg
I think you could try clearing the pool - however, consulting the fault management tools (fmdump and it's kin) might be smart first. It's possible this is an error in the controller. The output of 'cfgadm' might be of use also. On Wed, Feb 11, 2009 at 7:12 PM, Jens Elkner jel+...@cs.uni-magdeburg.de wrote: Hi, just found on a X4500 with S10u6: fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-GH, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Wed Feb 11 16:03:26 CET 2009 PLATFORM: Sun Fire X4500, CSN: 00:14:4F:20:E0:2C , HOSTNAME: peng SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: 74e6f0ec-b1e7-e49b-8d71-dc1c9b68ad2b DESC: The number of checksum errors associated with a ZFS device exceeded acceptable levels. Refer to http://sun.com/msg/ZFS-8000-GH for more information. AUTO-RESPONSE: The device has been marked as degraded. An attempt will be made to activate a hot spare if available. IMPACT: Fault tolerance of the pool may be compromised. REC-ACTION: Run 'zpool status -x' and replace the bad device. zpool status -x ... mirror DEGRADED 0 0 0 spare DEGRADED 0 0 0 c6t6d0 DEGRADED 0 0 0 too many errors c4t0d0 ONLINE 0 0 0 c7t6d0ONLINE 0 0 0 ... spares c4t0d0 INUSE currently in use c4t4d0 AVAIL Strange thing is, that for more than 3 month there was no single error logged with any drive. IIRC, before u4 I've seen occasionaly a bad checksum error message, but this was obviously the result from the wellknown race condition of the marvell driver when havy writes took place. So I tend to interprete it as an false alarm and think about 'zpool ... clear c6t6d0'. What do you think. Is this a good idea? Regards, jel. BTW: zpool status -x msg refers to http://www.sun.com/msg/ZFS-8000-9P, the event to http://sun.com/msg/ZFS-8000-GH - little bit inconsistent I think. -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 12768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is Disabling ARC on SolarisU4 possible?
Thanks Nathan, I want to test the underlying performance, of course the problem is I want to test the 16 or so disks in the stripe, rather than individual devices. Thanks Rob On 28/01/2009 22:23, Nathan Kroenert nathan.kroen...@sun.com wrote: Also - My experience with a very small ARC is that your performance will stink. ZFS is an advanced filesystem that IMO makes some assumptions about capability and capacity of current hardware. If you don't give what it's expecting, your results may be equally unexpected. If you are keen to test the *actual* disk performance, you should just use the underlying disk device like /dev/rdsk/c0t0d0s0 Beware, however, that any writes to these devices will indeed result in the loss of the data on those devices, zpools or other. Cheers. Nathan. Richard Elling wrote: Rob Brown wrote: Afternoon, In order to test my storage I want to stop the cacheing effect of the ARC on a ZFS filesystem. I can do similar on UFS by mounting it with the directio flag. No, not really the same concept, which is why Roch wrote http://blogs.sun.com/roch/entry/zfs_and_directio I saw the following two options on a nevada box which presumably control it: primarycache secondarycache Yes, to some degree this offers some capability. But I don't believe they are in any release of Solaris 10. -- richard But I¹m running Solaris 10U4 which doesn¹t have them -can I disable it? Many thanks Rob *|* *Robert Brown - **ioko *Professional Services *| | **Mobile:* +44 (0)7769 711 885 *| * ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- // // Nathan Kroenert nathan.kroen...@sun.com // // Systems Engineer Phone: +61 3 9869-6255 // // Sun Microsystems Fax:+61 3 9869-6288 // // Level 7, 476 St. Kilda Road Mobile: 0419 305 456// // Melbourne 3004 VictoriaAustralia // // | Robert Brown - ioko Professional Services | | Mobile: +44 (0)7769 711 885 | ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
I just tried putting a pool on a USB flash drive, writing a file to it, and then yanking it. I did not lose any data or the pool, but I had to reboot before I could get any zpool command to complete without freezing. I also had OS reboot once on its own, when I tried to issue a zpool command to the pool. OS did noticed the disk was yanked until i tried to status it. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Two zvol devices one volume?
Hi, Can anyone explain the following to me? Two zpool devices points at the same data, I was installing osol 2008.11 in xVM when I saw that there already was a partition on the installation disk. An old dataset that I deleted since i gave it a slightly different name than I intended is not removed under /dev. I should not have used that name, but two device links should perhaps not point to the same device ether zfs list |grep xvm/dsk zpool01/xvm/dsk 25.0G 2.63T 24.0K /zpool01/xvm/dsk zpool01/xvm/dsk/osol01-dsk01 10G 2.64T 2.53G - zpool01/xvm/dsk/ubuntu01-dsk0110G 2.64T 21.3K - # ls -l /dev/zvol/dsk/zpool01/xvm/dsk total 3 lrwxrwxrwx 1 root root 41 Feb 10 18:19 osol01-dsk01 - ../../../../../../devices/pseudo/z...@0:4c lrwxrwxrwx 1 root root 41 Feb 10 18:14 ubuntu-01-dsk01 - ../../../../../../devices/pseudo/z...@0:4c lrwxrwxrwx 1 root root 41 Feb 10 18:19 ubuntu01-dsk01 - ../../../../../../devices/pseudo/z...@0:5c # zpool history |grep xvm 2009-02-08.22:42:12 zfs create zpool01/xvm 2009-02-08.22:42:23 zfs create zpool01/xvm/media 2009-02-08.22:42:45 zfs create zpool01/xvm/dsk 2009-02-10.18:14:41 zfs create -V 10G zpool01/xvm/dsk/ubuntu-01-dsk01 2009-02-10.18:15:10 zfs destroy zpool01/xvm/dsk/ubuntu-01-dsk01 2009-02-10.18:15:21 zfs create -V 10G zpool01/xvm/dsk/ubuntu01-dsk01 2009-02-10.18:15:33 zfs create -V 10G zpool01/xvm/dsk/osol01-dsk01 # uname -a SunOS ollespappa 5.11 snv_107 i86pc i386 i86xpv While I am writing, is there any known issues with sharemgr and zfs in this release? svc:/network/shares/group:zfs hangs when going down since sharemgr stop zfs never returns... Thanks Henrik Johansson http://sparcv9.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Thu, 2009-02-12 at 17:35 -0500, Blake wrote: That does look like the issue being discussed. It's a little alarming that the bug was reported against snv54 and is still not fixed :( bugs.opensolaris.org's information about this bug is out of date. It was fixed in snv_54: changeset: 3169:1dea14abfe17 user:phitran date:Sat Nov 25 11:05:17 2006 -0800 files: usr/src/uts/common/io/scsi/targets/sd.c 6424510 usb ignores DKIOCFLUSHWRITECACHE - Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On 12-Feb-09, at 3:02 PM, Tim wrote: On Thu, Feb 12, 2009 at 11:31 AM, David Dyer-Bennet d...@dd-b.net wrote: On Thu, February 12, 2009 10:10, Ross wrote: Of course, that does assume that devices are being truthful when they say that data has been committed, but a little data loss from badly designed hardware is I feel acceptable, so long as ZFS can have a go at recovering corrupted pools when it does happen, instead of giving up completely like it does now. Well; not acceptable as such. But I'd agree it's outside ZFS's purview. The blame for data lost due to hardware actively lying and not working to spec goes to the hardware vendor, not to ZFS. If ZFS could easily and reliably warn about such hardware I'd want it to, but the consensus seems to be that we don't have a reliable qualification procedure. In terms of upselling people to a Sun storage solution, having ZFS diagnose problems with their cheap hardware early is clearly desirable :-). Right, well I can't imagine it's impossible to write a small app that can test whether or not drives are honoring correctly by issuing a commit and immediately reading back to see if it was indeed committed or not. You do realise that this is not as easy as it looks? :) For one thing, the drive will simply serve the read from cache. It's hard to imagine a test that doesn't involve literally pulling plugs; even better, a purpose built hardware test harness. Nonetheless I hope that someone comes up with a brilliant test. But if the ZFS team hasn't found one yet... it looks grim :) --Toby Like a zfs test cXtX. Of course, then you can't just blame the hardware everytime something in zfs breaks ;) --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
On Thu, 12 Feb 2009, Ross Smith wrote: As far as I'm concerned, the 7000 series is a new hardware platform, You are joking right? Have you ever looked at the photos of these new systems or compared them to other Sun systems? They are just re-purposed existing systems with a bit of extra secret sauce added. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
I'm sure it's very hard to write good error handling code for hardware events like this. I think, after skimming this thread (a pretty wild ride), we can at least decide that there is an RFE for a recovery tool for zfs - something to allow us to try to pull data from a failed pool. That seems like a reasonable tool to request/work on, no? On Thu, Feb 12, 2009 at 6:03 PM, Toby Thain t...@telegraphics.com.au wrote: On 12-Feb-09, at 3:02 PM, Tim wrote: On Thu, Feb 12, 2009 at 11:31 AM, David Dyer-Bennet d...@dd-b.net wrote: On Thu, February 12, 2009 10:10, Ross wrote: Of course, that does assume that devices are being truthful when they say that data has been committed, but a little data loss from badly designed hardware is I feel acceptable, so long as ZFS can have a go at recovering corrupted pools when it does happen, instead of giving up completely like it does now. Well; not acceptable as such. But I'd agree it's outside ZFS's purview. The blame for data lost due to hardware actively lying and not working to spec goes to the hardware vendor, not to ZFS. If ZFS could easily and reliably warn about such hardware I'd want it to, but the consensus seems to be that we don't have a reliable qualification procedure. In terms of upselling people to a Sun storage solution, having ZFS diagnose problems with their cheap hardware early is clearly desirable :-). Right, well I can't imagine it's impossible to write a small app that can test whether or not drives are honoring correctly by issuing a commit and immediately reading back to see if it was indeed committed or not. You do realise that this is not as easy as it looks? :) For one thing, the drive will simply serve the read from cache. It's hard to imagine a test that doesn't involve literally pulling plugs; even better, a purpose built hardware test harness. Nonetheless I hope that someone comes up with a brilliant test. But if the ZFS team hasn't found one yet... it looks grim :) --Toby Like a zfs test cXtX. Of course, then you can't just blame the hardware everytime something in zfs breaks ;) --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FW: Supermicro AOC-SAT2-MV8 hang when drive removed
On Thu, Feb 12, 2009 at 5:16 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Thu, 12 Feb 2009, Ross Smith wrote: As far as I'm concerned, the 7000 series is a new hardware platform, You are joking right? Have you ever looked at the photos of these new systems or compared them to other Sun systems? They are just re-purposed existing systems with a bit of extra secret sauce added. Bob Ya, that *secret sauce* is what makes it a new system. And out of the last 4 x4240's I've ordered, two had to have new motherboards installed within a week, and one had to have a new power supply. The other appears to have a dvd rom drive going flaky. So the fact they're based on existing hardware isn't exactly confidence inspiring either. Sun's old sparc gear: rock solid. The newer x64 has been leaving a bad taste in my mouth TBQH. The engineering behind the systems when I open them up is absolutely phenomenal. The failure rate, however, is downright scary. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On Thu, Feb 12 at 21:45, Mattias Pantzare wrote: A read of data in the disk cache will be read from the disk cache. You can't tell the disk to ignore its cache and read directly from the plater. The only way to test this is to write and the remove the power from the disk. Not easy in software. Not true with modern SATA drives that support NCQ, as there is a FUA bit that can be set by the driver on NCQ reads. If the device implements the spec, any overlapped write cache data will be flushed, invalidated, and a fresh read done from the non-volatile media for the FUA read command. --eric -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Two zvol devices one volume?
I tried to export the zpool also, and I got this, the strange part is that it sometimes still thinks that the ubuntu-01-dsk01 dataset exists: # zpool export zpool01 cannot open 'zpool01/xvm/dsk/ubuntu-01-dsk01': dataset does not exist cannot unmount '/zpool01/dump': Device busy But: # zfs destroy zpool01/xvm/dsk/ubuntu-01-dsk01 cannot open 'zpool01/xvm/dsk/ubuntu-01-dsk01': dataset does not exist Regards On Feb 12, 2009, at 11:51 PM, Henrik Johansson wrote: Hi, Can anyone explain the following to me? Two zpool devices points at the same data, I was installing osol 2008.11 in xVM when I saw that there already was a partition on the installation disk. An old dataset that I deleted since i gave it a slightly different name than I intended is not removed under /dev. I should not have used that name, but two device links should perhaps not point to the same device ether zfs list |grep xvm/dsk zpool01/xvm/dsk 25.0G 2.63T 24.0K /zpool01/xvm/dsk zpool01/xvm/dsk/osol01-dsk01 10G 2.64T 2.53G - zpool01/xvm/dsk/ubuntu01-dsk0110G 2.64T 21.3K - # ls -l /dev/zvol/dsk/zpool01/xvm/dsk total 3 lrwxrwxrwx 1 root root 41 Feb 10 18:19 osol01-dsk01 - ../../../../../../devices/pseudo/z...@0:4c lrwxrwxrwx 1 root root 41 Feb 10 18:14 ubuntu-01- dsk01 - ../../../../../../devices/pseudo/z...@0:4c lrwxrwxrwx 1 root root 41 Feb 10 18:19 ubuntu01-dsk01 - ../../../../../../devices/pseudo/z...@0:5c # zpool history |grep xvm 2009-02-08.22:42:12 zfs create zpool01/xvm 2009-02-08.22:42:23 zfs create zpool01/xvm/media 2009-02-08.22:42:45 zfs create zpool01/xvm/dsk 2009-02-10.18:14:41 zfs create -V 10G zpool01/xvm/dsk/ubuntu-01-dsk01 2009-02-10.18:15:10 zfs destroy zpool01/xvm/dsk/ubuntu-01-dsk01 2009-02-10.18:15:21 zfs create -V 10G zpool01/xvm/dsk/ubuntu01-dsk01 2009-02-10.18:15:33 zfs create -V 10G zpool01/xvm/dsk/osol01-dsk01 # uname -a SunOS ollespappa 5.11 snv_107 i86pc i386 i86xpv While I am writing, is there any known issues with sharemgr and zfs in this release? svc:/network/shares/group:zfs hangs when going down since sharemgr stop zfs never returns... Thanks Henrik Johansson http://sparcv9.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Henrik Johansson http://sparcv9.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Blake wrote: I'm sure it's very hard to write good error handling code for hardware events like this. I think, after skimming this thread (a pretty wild ride), we can at least decide that there is an RFE for a recovery tool for zfs - something to allow us to try to pull data from a failed pool. That seems like a reasonable tool to request/work on, no? The ability to force a roll back to an older uberblock in order to be able to access the pool (in the case of corrupt current uberblock) should be ZFS developer's very top priority, IMO. I'd offer to do it myself, but I have nowhere near the ability to do so. -- Dave ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Two zvol devices one volume?
Henrik Johansson wrote: I tried to export the zpool also, and I got this, the strange part is that it sometimes still thinks that the ubuntu-01-dsk01 dataset exists: # zpool export zpool01 cannot open 'zpool01/xvm/dsk/ubuntu-01-dsk01': dataset does not exist cannot unmount '/zpool01/dump': Device busy But: # zfs destroy zpool01/xvm/dsk/ubuntu-01-dsk01 cannot open 'zpool01/xvm/dsk/ubuntu-01-dsk01': dataset does not exist Regards I have seen this 'phantom dataset' with a pool on nv93. I created a zpool, created a dataset, then destroyed the zpool. When creating a new zpool on the same partitions/disks as the destroyed zpool, upon export I receive the same message as you describe above, even though I never created the dataset in the new pool. Creating a dataset of the same name and then destroying it doesn't seem to get rid of it, either. I never did remember to file a bug for it... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On 12-Feb-09, at 7:02 PM, Eric D. Mudama wrote: On Thu, Feb 12 at 21:45, Mattias Pantzare wrote: A read of data in the disk cache will be read from the disk cache. You can't tell the disk to ignore its cache and read directly from the plater. The only way to test this is to write and the remove the power from the disk. Not easy in software. Not true with modern SATA drives that support NCQ, as there is a FUA bit that can be set by the driver on NCQ reads. If the device implements the spec, ^^ Spec compliance is what we're testing for... We wouldn't know if this special variant is working correctly either. :) --T any overlapped write cache data will be flushed, invalidated, and a fresh read done from the non-volatile media for the FUA read command. --eric -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-USAS-L8i
Hey Tim, I've been happily using the AOC-USAS-L8i since we started talking about it a while ago. I have it stuck in a generic motherboard from ebay in a PCI-Express x16 slot since i wasn't going to have a 3d card in my NAS device or anything. Using 8 sata drives across it's two ports with mirrored vdevs in my pool. currently on 2008.11, haven't done an image update in a bit. ~Bryan -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Blake, On Thu, Feb 12, 2009 at 05:35:14PM -0500, Blake wrote: That does look like the issue being discussed. It's a little alarming that the bug was reported against snv54 and is still not fixed :( Looks like the bug-report is out of sync. I see that the bug has been fixed in B54. Here is the link to source gate which shows that the fix is in the gate : http://src.opensolaris.org/source/search?q=defs=refs=path=hist=6424510project=%2Fonnv And here are the diffs : http://src.opensolaris.org/source/diff/onnv/onnv-gate/usr/src/uts/common/io/scsi/targets/sd.c?r2=%2Fonnv%2Fonnv-gate%2Fusr%2Fsrc%2Futs%2Fcommon%2Fio%2Fscsi%2Ftargets%2Fsd.c%403169r1=%2Fonnv%2Fonnv-gate%2Fusr%2Fsrc%2Futs%2Fcommon%2Fio%2Fscsi%2Ftargets%2Fsd.c%403138 Thanks and regards, Sanjeev. Does anyone know how to push for resolution on this? USB is pretty common, like it or not for storage purposes - especially amongst the laptop-using dev crowd that OpenSolaris apparently targets. On Thu, Feb 12, 2009 at 4:44 PM, bdebel...@intelesyscorp.com bdebel...@intelesyscorp.com wrote: Is this the crux of the problem? http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6424510 'For usb devices, the driver currently ignores DKIOCFLUSHWRITECACHE. This can cause catastrophic data corruption in the event of power loss, even for filesystems like ZFS that are designed to survive it. Dropping a flush-cache command is just as bad as dropping a write. It violates the interface that software relies on to use the device.' -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Sanjeev Bagewadi Solaris RPE Bangalore, India ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
bcirvin, you proposed something to allow us to try to pull data from a failed pool. Yes and no. 'Yes' as a pragmatic solution; 'no' for what ZFS was 'sold' to be: the last filesystem mankind would need. It was conceived as a filesystem that does not need recovery, due to its guaranteed consistent states on the/any drive - or better: at any moment. If this was truly the case, a recovery program was not needed, and I don't think SUN will like one neither. It also is more then suboptimal to prevent caching as proposed by others; this is but a very ugly hack. Again, and I have yet to receive comments on this, the original poster claimed to have done a proper flash/sync, and left a 100% consistent file system behind on his drive. At reboot, the pool, the higher entity, failed miserably. Of course, now one can conceive a program that scans the whole drive, like in the good ole days on ancient file systems to recover all those 100% correct file system(s). Or, one could - as proposed - add an Überblock, like we had the FAT-mirror in the last millennium. The alternative, and engineering-wise much better solution, would be to diagnose the weakness on the contextual or semantical level: Where 100% consistent file systems cannot be communicated to by the operating system. This - so it seems - is (still) a shortcoming of the concept of ZFS. Which might be solved by means of yesterday, I agree. Or, by throwing more work into the level of the volume management, the pools. Without claiming to have the solution, conceptually I might want to propose to do away with the static, look-up-table-like structure of the pool, as stored in a mirror or Überblock. Could it be feasible to associate pools dynamically? Could it be feasible, that the filesystems in a pool create a (new) handle once they are updated in a consistent manner? And when the drive is plugged/turned on, the software simply collects all the handles of all file systems on that drive? Then the export/import is possible, but not required any longer, since the filesystems form their own entities. They can still have associated contextual/semantic (stored) structures into which they are 'plugged' once the drive is up; if one wanted to ('logical volume'). But with or without, the pool would self-configure when the drive starts by picking up all file system handles. Uwe -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-USAS-L8i
Thanks for all the help guys. Based on the success reports, i'll give it a shot in my intel s3210shlc board next week when the UIO card arrives. I'll report back on the success or destruction that follows...now i just hope solaris 10 10/08, but it sounds like it should. Cheers, Brent -Original Message- From: Dave [mailto:dave-...@dubkat.com] Sent: Fri 2/13/2009 9:22 AM To: Will Murnane Cc: Tim; zfs-discuss@opensolaris.org; Brent Avery Subject: Re: [zfs-discuss] Supermicro AOC-USAS-L8i Will Murnane wrote: On Thu, Feb 12, 2009 at 20:05, Tim t...@tcsac.net wrote: Are you selectively ignoring responses to this thread or something? Dave has already stated he *HAS IT WORKING TODAY*. No, I saw that post. However, I saw one unequivocal it doesn't work earlier (even if I can't show it to you), which implies to me that whether the card works or not in a particular setup is somewhat finicky. So here's one datapoint: Dave wrote: Yes. I have an AOC-USAS-L8i working in a regular PCI-E slot in my Tyan 2927 motherboard. but the thread that Brandon linked to does not contain a datapoint. For what it's worth, I think these are the only two datapoints I've seen; most threads about this card end up debating back and forth whether it will work, with nobody actually buying and testing the card. I can tell you that the USAS-L8i absolutely works fine with a Tyan 2927 in a Chenbro RM31616 3U rackmount chassis. In fact, I have two of the USAS-L8i in this chassis because I forgot that, unlike the 8-port AOC-SAT2-MV8, the USAS-L8i can support up to 122 drives. I have 8 drives connected to the first USAS-L8i. They are set up in a raidz-2 and I get 90-120MB/sec read and 60-75MB/sec write during my rsyncs from linux machines (this solaris box is only used to store backup data). I plan on removing the second USAS-L8i and connect all 16 drives to the first USAS-L8i when I need more storage capacity. I have no doubt that it will work as intended. I will report to the list otherwise. -- Dave ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
On February 12, 2009 1:44:34 PM -0800 bdebel...@intelesyscorp.com wrote: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6424510 ... Dropping a flush-cache command is just as bad as dropping a write. Not that it matters, but it seems obvious that this is wrong or anyway an exaggeration. Dropping a flush-cache just means that you have to wait until the device is quiesced before the data is consistent. Dropping a write is much much worse. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss