Re: [zfs-discuss] TLER and ZFS
As a home user, here are my thoughts. WD = ignore (TLER issues, parking issues, etc) I recently built up a server on Osol running Samsung 1.5TB drives. They are "green", but don't seem to have the irritating "features" found on the WD "green" drives. They are 5400RPM, but seem to transfer data plenty fast for a home setup. Current setup is 2x6-disk raidz2. Seek times obviously hurt, and ZIL caused so many issues that I turned it off. Yes, I know I might lose some data doing that, yes, I'm OK with the tradeoff. The ZFS devs say I won't lose filesystem consistency, just that the write cache could be lost, about 30sec of data in most cases. As it's on a UPS and the rest of the network isn't, or is on small UPSes, it will be the last box online, so any clients will probably have their data saved before the server goes down. The next upgrade is a UPS that can tell the server power is out so it can shut down gracefully. I'll probably get an SSD for slog/l2arc at some point and re-enable ZIL, but for now, this does the job as SSDs that don't have similar issues when used as slog devices are rare and expensive. If the X25e won't do... This setup with the 5400 RPM drives is significantly faster than the same box with 7200RPM Seagate 400G drives was. Of course, those 400G drives are a few years old now, but I was pleasantly surprised by the speed I get out of the Samsungs. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] General help with understanding ZFS performance bottlenecks
NFS writes on ZFS blows chunks performance wise. The only way to increase the write speed is by using an slog, the problem is that a "proper" slog device (one that doesn't lose transactions) does not exist for a reasonable price. The least expensive SSD that will work is the Intel X25-E, and even then you have to disable the write cache, which kills performance. And if you lose transactions in the ZIL, you may as well not have one. Switching to a pool configuration with mirrors might help some. You will still get hit with sync write penalties on NFS though. Before messing with that, try disabling the ZIL entirely and see if that's where your problems are. Note that running without a ZIL can cause you to lose about 30secs of uncommitted data and if the server crashes without the clients rebooting, you can get corrupted data (from the client's perspective). However, it solved the performance issue for me. If that works, you can then decide how important the ZIL is to you. Personally, I like things to be correct, but that doesn't help me if performance is in the toilet. In my case, the server is on a UPS, the clients aren't. And most of the clients use netboot anyway, so they will crash and have to be rebooted if the server goes down. So for me, the drawback is small while the performance gain is huge. That's not the case for everyone, and it's up to the admin to decide what they can live with. Thankfully, the next release of OpenSolaris will have the ability to set ZIL on/off per filesystem. Note that the ZIL only effects sync write speed, so if your workload isn't sync heavy, it might not matter in your case. However, with NFS in the mix, it probably is. The ZFS on-disk data state is not effected by ZIL on/off, so your pool's data IS safe. You might lose some data that a client THINKS is safely written, but the ZFS pool will come back properly on reboot. So the client will be wrong about what is and is not written, thus the possible "corruption" from the client perspective. I run ZFS on 2 6-disk raidz2 arrays in the same pool and performance is very good locally. With ZIL enabled, NFS performance was so bad it was near unusable. With it disabled, I can saturate the single gigabit link and performance in the Linux VM (xVM) running on that server improved significantly, to near local speed, when using the NFS mounts to the main pool. My 5400RPM drives were not up to ZIL's needs, though they are plenty fast in general, and a working slog was out of budget for a home server. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New SSD options
> use a slog at all if it's not durable? You should > disable the ZIL > instead. This is basically where I was going. There only seems to be one SSD that is considered "working", the Zeus IOPS. Even if I had the money, I can't buy it. As my application is a home server, not a datacenter, things like NFS breaking if I don't reboot the clients is a non-issue. As long as the on-disk data is consistent so I don't have to worry about the entire pool going belly-up, I'm happy enough. I might lose 30 seconds of data, worst case, as a result of running without ZIL. Considering that I can't buy a proper ZIL at a cost I can afford, and an improper ZIL is not worth much, I don't see a reason to bother with ZIL at all. I'll just get a cheap large SSD for L2ARC, disable ZIL, and call it a day. For my use, I'd want a device in the $200 range to even consider an slog device. As nothing even remotely close to that price range exists that will work properly at all, let alone with decent performance, I see no point in ZIL for my application. The performance hit is just too severe to continue using it without an slog, and there's no slog device I can afford that works properly, even if I ignore performance. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New SSD options
> On May 19, 2010, at 2:29 PM, Don wrote: > The data risk is a few moments of data loss. However, > if the order of the > uberblock updates is not preserved (which is why the > caches are flushed) > then recovery from a reboot may require manual > intervention. The amount > of manual intervention could be significant for > builds prior to b128. This risk is mostly mitigated by UPS backup and auto-shutdown when the UPS detects power loss, correct? Outside of pulling the plug that should solve power related problems. Kernel panics should only be caused by hardware issues, which might corrupt the disk data anyway. Obviously software can and does fail, but the biggest problem I hear about with ZIL devices is behavior in a sudden power loss situation. It seems to me that UPS backup along with starting a shutdown cycle before complete power failure should prevent most issues. Seems like that should help with issues like the X25-E not honoring cache flush as well, the UPS would give it time to finish the writes. Again, without a firmware issue in the drive itself. Should be about the same as a supercap anyway. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
Disable ZIL and test again. NFS does a lot of sync writes and kills performance. Disabling ZIL (or using the synchronicity option if a build with that ever comes out) will prevent that behavior, and should get your NFS performance close to local. It's up to you if you want to leave it that way. There are reasons not to as well. NFS clients can get corrupted views of the filesystem should the server go down before a write flush is completed. ZIL prevents that problem. In my case, the clients aren't on a UPS while the server is, so it's not an issue. :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Strategies for expanding storage area of home storage-server
When I did a similar upgrade a while back I did #2. Create a new pool raidz2 with 6 drives, copy the data to it, verify the data, delete the old pool, add old drives + some new drives to another 6 disk raidz2 in the new pool. Performance has been quite good, and the migration was very smooth. The other nice thing about this arrangement for a home user is that I now only need to upgrade 6 drives to get more space, rather than 12 per option #1. To be clear, this is my current config. NAME STATE READ WRITE CKSUM raid ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 c9t4d0 ONLINE 0 0 0 c9t5d0 ONLINE 0 0 0 c9t6d0 ONLINE 0 0 0 c9t7d0 ONLINE 0 0 0 c10t5d0 ONLINE 0 0 0 c10t4d0 ONLINE 0 0 0 raidz2-1 ONLINE 0 0 0 c9t0d0 ONLINE 0 0 0 c9t1d0 ONLINE 0 0 0 c10t0d0 ONLINE 0 0 0 c10t1d0 ONLINE 0 0 0 c10t2d0 ONLINE 0 0 0 c10t3d0 ONLINE 0 0 0 -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacement brackets for Supermicro UIO SAS cards....
Thanks! I might just have to order a few for the next time I take the server apart. Not that my bent up versions don't work, but I might as well have them be pretty too. :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on drives for ZIL/L2ARC?
> I've got an OCZ Vertex 30gb drive with a 1GB stripe > used for the slog > and the rest used for the L2ARC, which for ~ $100 has > been a nice > boost to nfs writes. What about the Intel X25-V? I know it will likely be fine for L2ARC, but what about ZIL/slog? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on drives for ZIL/L2ARC?
> > From: zfs-discuss-boun...@opensolaris.org > [mailto:zfs-discuss- > > boun...@opensolaris.org] On Behalf Of Travis Tabbal > > Oh, one more thing. Your subject says "ZIL/L2ARC" > and your message says "I > want to speed up NFS writes." > > ZIL (log) is used for writes. > L2ARC (cache) is used for reads. > > I'd recommend looking at the ZFS Best Practices > Guide. At the end of my OP I mentioned that I was interested in L2ARC for dedupe. It sounds like the DDT can get bigger than RAM and slow things to a crawl. Not that I expect a lot from using an HDD for that, but I thought it might help. I'd like to get a nice SSD or two for this stuff, but that's not in the budget right now. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on drives for ZIL/L2ARC?
> If your clients are mounting "async" don't bother. > If the clients are > ounting async, then all the writes are done > asynchronously, fully > accelerated, and never any data written to ZIL log. I've tried async, things run well until you get to the end of the job, then the process hangs until the write is complete. This was just with tar extracting to the NFS drive. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Thoughts on drives for ZIL/L2ARC?
I have a few old drives here that I thought might help me a little, though not at much as a nice SSD, for those uses. I'd like to speed up NFS writes, and there have been some mentions that even a decent HDD can do this, though not to the same level a good SSD will. The 3 drives are older LVD SCSI Cheetah drives. ST318203LW. I have 2 controllers I could use, one appears to be a RAID controller with a memory module installed. An Adaptec AAA-131U2. The memory module comes up on Google as a 2MB EDO DIMM. Not sure that's worth anything to me. :) The other controller is an Adaptec 29160. Looks to be a 64-bit PCI card, but the machine it came from is only 32-bit PCI, as is my current machine. What say the pros here? I'm concerned that the max data rate is going to be somewhat low with them, but the seek time should be good as they are 10K RPM (I think). The only reason I thought to use one for L2ARC is for dedupe. It sounds like L2ARC helps a lot there. This is for a home server, so all I'm really looking to do is speed things up a bit while I save and look for a decent SSD option. However, if it's a waste of time, I'd rather find out before I install them. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Non-redundant zpool behavior?
Thanks. That's what I expected the case to be. Any reasons this shouldn't work for strictly backup purposes? Obviously, one disk down kills the pool, but as I only ever need to care if I'm restoring, that doesn't seem to be such a big deal. It will be a secondary backup destination for local machines like laptops that don't have redundant storage. The primary backups will still be hosted on the main server with 2 raidz2 arrays. The only downside I can see to this idea is that I was expecting it to be used as an offsite backup as well, so in a real disaster I might have only a single non-redundant copy of the data. That alone might be enough reason for me not to do it. After getting used to redundancy, it's hard to go back to not having it. :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Non-redundant zpool behavior?
I have a small stack of disks that I was considering putting in a box to build a backup server. It would only store data that is duplicated elsewhere, so I wouldn't really need redundancy at the disk layer. The biggest issue is that the disks are not all the same size. So I can't really do a raidz or mirror with them anyway. So I was considering just putting them all in one pool. My question is how does zpool behave if I lose one disk in this pool? Can I still access the data on the other disks? Or is it like a traditional raid0 and I lose the whole pool? Is there a better way to deal with this, using my old mismatched hardware? Yes, I could probably build a raidz by partitioning and such, but I'd like to avoid the complexity. I'd probably just use zfs send/recv to send snapshots over or perhaps crashplan. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
Supermicro USAS-L8i controllers. I agree with you, I'd much rather have the drives respond properly and promptly than save a little power if that means I'm going to get strange errors from the array. And these are the "green" drives, they just don't seem to cause me any problems. The issues people have noted with WD have made me stay away from them as just about every drive I own lives in some kind of RAID sometime in its life. I have a couple laptop drives that are single, all desktops have at least a mirror. I'm a little nuts and would probably install mirrors in the laptops if there were somewhere to put them. :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
smartmontools doesn't work with my controllers. I can try it again when the 2 new drives I've ordered arrive. I'll try connecting to the motherboard ports and see if that works with smartmontools. I haven't noticed any sleeping with the drives. I don't get any lag accessing the array or any error messages about them disappearing. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] I can't seem to get the pool to export...
On Sun, Jan 17, 2010 at 8:14 PM, Richard Elling wrote: > On Jan 16, 2010, at 10:03 PM, Travis Tabbal wrote: > > > Hmm... got it working after a reboot. Odd that it had problems before > that. I was able to rename the pools and the system seems to be running well > now. Irritatingly, the settings for sharenfs, sharesmb, quota, etc. didn't > get copied over with the zfs send/recv. I didn't have that many filesystems > though, so it wasn't too bad to reconfigure them. > > What OS or build? I've had similar issues with b130 on all sorts of mounts > besides ZFS. > Opensolaris snv_129. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
HD154UI/1AG01118 They have been great drives for a home server. Enterprise users probably need faster drives for most uses, but they work great for me. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
I've been having good luck with Samsung "green" 1.5TB drives. I have had 1 DOA, but I currently have 10 of them, so that's not so bad. In that size purchase, I've had one bad from just about any manufacturer. I've avoided WD for RAID because of the error handling stuff kicking drives out of arrays. I don't know if that's currently an issue though. And with Seagate's recent record, I didn't feel confident in their larger drives. I was concerned about the 5400RPM speed being a problem, but I can read over 100MB/s from the array, and 95% on my use is over a gigabit LAN, so they are more than fast enough for my needs. I just set up a new array with them, 6 in raidz2. The replacement time is high enough that I decided the extra parity was worth the cost, even for a home server. I need 2 more drives, then I'll migrate my other 4 from the older array over as well into another 6 drive raidz2 and add it to the pool. I have decided to treat HDDs as completely untrustworthy. So when I get new drives I test them by creating a temporary pool in a mirror config and filling the drives up by copying data from the primary array. Then do a scrub. When it's done, if you get no errors, and no other errors in dmesg, then wait a week or so and do another scrub test. I found a bad SATA hotswap backplane and a bad drive this way. There are probably faster ways, but this works for me. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] I can't seem to get the pool to export...
Hmm... got it working after a reboot. Odd that it had problems before that. I was able to rename the pools and the system seems to be running well now. Irritatingly, the settings for sharenfs, sharesmb, quota, etc. didn't get copied over with the zfs send/recv. I didn't have that many filesystems though, so it wasn't too bad to reconfigure them. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] I can't seem to get the pool to export...
r...@nas:~# zpool export -f raid cannot export 'raid': pool is busy I've disabled all the services I could think of. I don't see anything accessing it. I also don't see any of the filesystems mounted with mount or "zfs mount". What's the deal? This is not the rpool, so I'm not booted off it or anything like that. I'm on snv_129. I'm attempting to move the main storage to a new pool. I created the new pool, used "zfs send | zfs recv" for the filesystems. That's all fine. The plan was to export both pools, and use the import to rename them. I've got the new pool exported, but the older one refuses to export. Is there some way to get the system to tell me what's using the pool? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raidz data loss stories?
> Everything I've seen you should stay around 6-9 > drives for raidz, so don't do a raidz3 with 12 > drives. Instead make two raidz3 with 6 drives each > (which is (6-3)*1.5 * 2 = 9 TB array.) So the question becomes, why? If it's performance, I can live with lower IOPS and max throughput. If it's reliability, I'd like to hear why. I would think that the number of acceptable devices in a raidz would scale somewhat with the number of drives used for parity. So I would expect to see a sliding scale somewhat like the one mentioned before regarding disk size vs. raidz level. For example: 3-4 drives: raidz1 4-8 drives: raidz2 8+ drives: raidz3 In practice, I would expect to see some kind of chart with number of devices and size of devices used together to determine the proper raidz level. Perhaps I'm way off base though. Note that I don't really have a problem doing 2 arrays, but I would think that perhaps raidz2 would be acceptable in that configuration. The benefit to that config for me would be that I could create a parallel array of 6 to copy my existing data to, then add the second array after the initial file copy/scrub. I would need fewer disks to complete the transition. > As for whether or not to do raidz, for me the > issue is performance. I can't handle the raidz > write penalty. If I needed triple drive protection, > a 3way mirror setup would be the only way I would > go. I don't yet quite understand why a 3+ drive > raidz2 vdev is better than a 3 drive mirror vdev? > Other than a 5 drive setup is 3 drives of space > when a 6 drive setup using 3 way mirror is only 2 > drive space. I've already stipulated that performance is not the primary concern. 100MB/sec with reasonable random I/O for a max of 5 clients is more than enough. My existing raidz is more than fast enough for my needs, and I have 5400RPM drives in there. I'd be very interested to hear an expert opinion on this. Given, say, 6 disks. What advantage in reliability, if any, would a raidz3 have vs. a striped pair of 3-way mirrors? Obviously the raidz3 has 1 disk worth of extra space, but we're talking about reliability here. I would guess performance would be higher with the mirrors. With all of my comments, please keep in mind that I am not a huge enterprise customer with loads of money to spend on this. If I were, I'd just buy Thumpers. I'm a home user with a decent fileserver. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raidz data loss stories?
Interesting discussion. I know the bias here is generally toward enterprise users. I was wondering if the same recommendations hold for home users that are generally more price sensitive. I'm currently running OpenSolaris on a system with 12 drives. I had split them into 3 sets of 4 raidz1 arrays. This made some sense at the time as I can upgrade 4 disks at a time as new sizes come out. However, with 8 of the disks currently being 1.5TB, I'm getting concerned about this strategy. While important data is backed up, a loss of the server data would be very irritating. My next thought was to get more drives and run a single raidz3 vdev with 12x1.5TB. More space than I need for quite a while, since I can't add just a few drives, triple parity for protection. I'd need a few extra drives to hold the data while I rebuild the main array, so I'd have cold-spares available that I would use for backing up critical data from the server. So they would see use and scrubs, not just sitting on the shelf. Access is over a gigE network, so I don't need more performance than that. I have read that the overall speed of a vdev is approximately the speed of a single device in the vdev, and in this case that is more than fast enough. I'm curious what the experts here think of this new plan. I'm pretty sure I know what you all think of the old one. :) Do you recommend swapping spare drives into the array periodically? It seems like it wouldn't really be any better than running scrub over the same period, but I've heard of people doing it on hardware raid controllers. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How can we help fix MPT driver post build 129
To be fair, I think it's obvious that Sun people are looking into it and that users are willing to help diagnose and test. There were requests for particular data in those threads you linked to, have you sent yours? It might help them find a pattern in the errors. I understand the frustration that it hasn't been fixed in a couple builds that they have been aware of it, but it could be a very tricky problem. It also sounds like it's not reproducible on Sun hardware, so they have to get cards and such as well. It's also less urgent now that they have identified a workaround that works for most of us. While disabling MSIs is not optimal, it does help a lot. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Perhaps. As I noted though, it also occurs on the onboard NVidia SATA controller when MSI is enabled. I had already put a line in /etc/system to disable MSI for that controller per a forum thread and it worked great. I'm now running with all MSI disabled via XVM as the mpt controller is giving me the same problems. As it's happening on totally different controller types, cable types, and drive types, I have to go with software issues. I know for sure the NVidia issue didn't come up on 2009.06. It makes the system take forever to boot, so it's very noticeable. It happened when I first went to dev builds, I want to say it was around b118. I updated for better XVM support for newer Linux kernels. The NVidia controller causes similar log messages. Command timeouts. Disabling MSIs fixes it as well. Motherboard is an Asus M4N82 Deluxe. NVIDIA nForce 980a SLI chipset. I expect the root cause is the same, and I would guess that something is causing the drivers to miss or not receive some interrupts. However, my programming at this level is limited, so perhaps I'm misdiagnosing the issue. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Just an update, my scrub completed without any timeout errors in the log. XVM with MSI disabled globally. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] mpt errors on snv 127
If someone from Sun will confirm that it should work to use the mpt driver from 2009.06, I'd be willing to set up a BE and try it. I still have the snapshot from my 2009.06 install, so I should be able to mount that and grab the files easily enough. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
> (1) disabling MSI support in xVM makes the problem go > away Yes here. > (6) mpt(7d) without MSI support is sloow. That does seem to be the case. It's not so bad overall, and at least the performance is consistent. It would be nice if this were improved. > For those of you who have been running xVM without > MSI support, > could you please confirm whether the devices > exhibiting the problem > are internal to your host, or connected via jbod. And > if via jbod, > please confirm the model number and cables. Direct connect. The drives are in hot-swap racks, but they are passive devices. No expanders or anything like that in there. In case it's interesting, the racks are StarTech HSB430SATBK devices. I'm using SAS to SATA breakout cables to connect them. I have tried different lengths with the same result. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
> o The problems are not seen with Sun's version of > this card Unable to comment as I don't have a Sun card here. If Sun would like to send me one, I would be willing to test it compared to the cards I do have. I'm running Supermicro USAS-L8i cards (LSI 1068e based). > o The problems are not seen with LSI's version of > the driver I haven't tried it as comments from Sun staff here have indicated that it's not a good idea. > o The problems are seen with the latest LSI > firmware Yes. When I checked, the LSI site was listing the version I see at boot. > o Errors still occur if MSIs are disabled. I haven't seen any command timeout errors since disabling MSIs. I tried using the command to disable MSI only for the MPT driver, but I get a similar error from the NVidia driver at that point as it has my boot drives. It seems to me that the issue seems to have more in common with MSIs than the drivers themselves. I do have a scrub scheduled for 12/1, so I can check the logs after than to see if it appears from that. My other tests have not triggered the issue since disabling MSIs. I'm currently running with "set xpv_psm:xen_support_msi = -1". I am not using any jbod enclosures. My setup uses SAS to SATA breakout cables and connect directly to the drives. I have tried different cables and lengths. The timeouts affected drives in a seemingly random fashion. I would get timeouts on both controllers and every drive over time. I have never had command errors here. Just the timeouts. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
> Travis Tabbal wrote: > > I have a possible workaround. Mark Johnson > has > > been emailing me today about this issue and he > proposed the > > following: > > > >> You can try adding the following to /etc/system, > then rebooting... > >> set xpv_psm:xen_support_msi = -1 > > I am also running XVM, and after modifying > /etc/system and rebooting, my > zpool scrub test is runing along merrily with no > hangs so far, where > usually I would expect to see several by now. > > Can the other folks who have seen this please test > and report back? I'd > hate to think we solved it only to discover there > were overlapping bugs. > > Fingers crossed, and many thanks to those who have > worked to track this > down! Nice to see we have one confirmed report that things are working. Hopefully we get a few more! Even if it's just a workaround until a real fix makes it in, it gets us running. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
> > On Nov 23, 2009, at 7:28 PM, Travis Tabbal wrote: > > > I have a possible workaround. Mark Johnson > > > has been emailing me today about this issue and he > proposed the > > following: > > > >> You can try adding the following to /etc/system, > then rebooting... > >> set xpv_psm:xen_support_msi = -1 > > would this change affect systems not using XVM? we > are just using > these as backup storage. Probably not. Are you seeing the issue without XVM installed? We had one other user report that the issue went away when they removed XVM, so I had thought it wouldn't affect other users. If you are getting the same issue without XVM, there may be overlapping bugs in play. Someone at Sun might be able to tell you how to disable MSI on the controller. Someone told me how to do it for the NVidia SATA controller when there was a bug in that driver. I would think there is a way to do it for the MPT driver. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
I have a possible workaround. Mark Johnson has been emailing me today about this issue and he proposed the following: > You can try adding the following to /etc/system, then rebooting... > set xpv_psm:xen_support_msi = -1 I have been able to format a ZVOL container from a VM 3 times while other activity is going on the system and it's working. I think performance is down a bit, but it's still acceptable. More importantly, it does so without killing the server. I would get the stall every time I would try this test before. So at least 1 case seems to be helped by doing this. I'll watch the server over the next few days to see if it stays improved. He mentioned that there is a fix being worked on for MSI handling in XVM that might make it into b129 that could fix this problem. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
> I will give you all of this information on monday. > This is great news :) Indeed. I will also be posting this information when I get to the server tonight. Perhaps it will help. I don't think I want to try using that old driver though, it seems too risky for my taste. Is there a command to get the disk firmware rev from OpenSolaris while booted up? I know of some boot CDs that can get to it, but I'm unsure about accessing it while the server is running. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SNV_125 MPT warning in logfile
> The latter, we run these VMs over NFS anyway and had > ESXi boxes under test already. we were already > separating "data" exports from "VM" exports. We use > an in-house developed configuration management/bare > metal system which allows us to install new machines > pretty easily. In this case we just provisioned the > ESXi VMs to new "VM" exports on the Thor whilst > re-using the data-exports as they were... Thanks for the info. Unfortunately, I need this box to do double duty and run the VMs as well. The hardware is capable, this issue with XvM and/or the mpt driver just needs to get fixed. Other than that, things are running great with this server. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SNV_125 MPT warning in logfile
> > I'm running nv126 XvM right now. I haven't tried > it > > without XvM. > > Without XvM we do not see these issues. We're running > the VMs through NFS now (using ESXi)... Interesting. It sounds like it might be an XvM specific bug. I'm glad I mentioned that in my bug report to Sun. Hopefully they can duplicate it. I'd like to stick with XvM as I've spent a fair amount of time getting things working well under it. How did your migration to ESXi go? Are you using it on the same hardware or did you just switch that server to an NFS server and run the VMs on another box? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on JBOD storage, mpt driver issue - server not responding
> Have you tried wrapping your disks inside LVM > metadevices and then used those for your ZFS pool? I have not tried that. I could try it with my spare disks I suppose. I avoided LVM as it didn't seem to offer me anything ZFS/ZPOOL didn't. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on JBOD storage, mpt driver issue - server not responding
> What type of disks are you using? I'm using SATA disks with SAS-SATA breakout cables. I've tried different cables as I have a couple spares. mpt0 has 4x1.5TB Samsung "Green" drives. mpt1 has 4x400GB Seagate 7200 RPM drives. I get errors from both adapters. Each adapter has an unused SAS channel available. If I can get this fixed, I'm planning to populate those as well. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on JBOD storage, mpt driver issue - server not responding
I submitted a bug on this issue, it looks like you can reference other bugs when you submit one, so everyone having this issue could possibly link mine and submit their own hardware config. It sounds like it's widespread though, so I'm not sure if that would help or hinder. I'd hate to bury the developers/QA team under a mountain of duplicate requests. CR 6900767 -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on JBOD storage, mpt driver issue - server not responding
On Wed, Nov 11, 2009 at 10:25 PM, James C. McPherson wrote: > > The first step towards "acknowledging" that there is a problem > is you logging a bug in bugs.opensolaris.org. If you don't, we > don't know that there might be a problem outside of the ones > that we identify. > I apologize if I offended by not knowing the protocol. I thought that posting in the forums was watched and the bug tracker updated by people at Sun. I didn't think normal users had access to submit bugs. Thank you for the reply. I have submitted a bug on the issue with all the information I think might be useful. If someone at Sun would like more information, output from commands, or testing, I would be happy to help. I was not provided with a bug number by the system. I assume that those are given out if the bug is deemed worthy of further consideration. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on JBOD storage, mpt driver issue - server not responding
> Have you tried another SAS-cable? I have. 2 identical SAS cards, different cables, different disks (brand, size, etc). I get the errors on random disks in the pool. I don't think it's hardware related as there have been a few reports of this issue already. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on JBOD storage, mpt driver issue - server not responding
> Hi, you could try LSI itmpt driver as well, it seems > to handle this better, although I think it only > supports 8 devices at once or so. > > You could also try more recent version of opensolaris > (123 or even 126), as there seems to be a lot fixes > regarding mpt-driver (which still seems to have > issues). I won't speak for the OP, but I've been seeing this same behaviour on 126 with LSI 1068E based cards (Supermicro USAS-L8i). For the LSI driver, how does one install it? I'm new to OpenSolaris and don't want to mess it up. It looked to be very old, is Solaris backward compatibility that good? It would be really nice if Sun would at least acknowledge the bug and that they can/can't reproduce it. I'm happy to supply information and test things if it will help. I have some spare disks I can attach to one of these cards and test driver updates and such. It sounds like people with Sun hardware are experiencing this as well. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SNV_125 MPT warning in logfile
I am also running 2 of the Supermicro cards. I just upgraded to b126 and it seems improved. I am running a large file copy locally. I get these warnings in the dmesg log. When I do, I/O seems to stall for about 60sec. It comes back up fine, but it's very annoying. Any hints? I have 4 disks per controller right now, different brands, sizes, everything. New SATA fanout cables and no expanders. The drives on mpt0 and mpt1 are completely different, 4x400GB Seagate drives, 4x1.5TB Samsung drives. I get the problem from both controllers. I didn't notice this till about b124. I can reproduce it with rsync copying files locally between ZFS filesystems and with --bwlimit=1 (10MB/sec). Keeping the limit low does seem to help. --- Oct 31 23:05:32 nas scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci10de,7...@10/pci10de,5...@0/pci10de,5...@3/pci15d9,a...@0 (mpt1): Oct 31 23:05:32 nas Disconnected command timeout for Target 7 Oct 31 23:09:42 nas scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci10de,7...@10/pci10de,5...@0/pci10de,5...@2/pci15d9,a...@0 (mpt0): Oct 31 23:09:42 nas Disconnected command timeout for Target 1 Oct 31 23:16:23 nas scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci10de,7...@10/pci10de,5...@0/pci10de,5...@2/pci15d9,a...@0 (mpt0): Oct 31 23:16:23 nas Disconnected command timeout for Target 3 Oct 31 23:18:43 nas scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci10de,7...@10/pci10de,5...@0/pci10de,5...@3/pci15d9,a...@0 (mpt1): Oct 31 23:18:43 nas Disconnected command timeout for Target 6 Oct 31 23:27:24 nas scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci10de,7...@10/pci10de,5...@0/pci10de,5...@3/pci15d9,a...@0 (mpt1): Oct 31 23:27:24 nas Disconnected command timeout for Target 7 -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool with very different sized vdevs?
Hmm.. I expected people to jump on me yelling that it's a bad idea. :) How about this, can I remove a vdev from a pool if the pool still has enough space to hold the data? So could I add it in and mess with it for a while without losing anything? I would expect the system to resliver the data onto the remaining vdevs, or tell me to go jump off a pier. :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] bewailing of the n00b
> - How can I effect OCE with ZFS? The traditional > 'back up all the data somewhere, add a drive, > re-establish the file system/pools/whatever, then > copy the data back' is not going to work because > there will be nowhere to temporarily 'put' the > data. Add devices to the pool. Preferably in mirrors or raidz configurations. If you just add bare devices to the pool you are running RAID-0, no redundancy. You cannot add devices to a raidz, as mentioned. But you can add more raidz or mirror devices. You can also replace devices with larger ones. It would be nice to be able to add more devices to a raidz for home users like us, maybe we'll see it someday. For now, the capabilities we do have make it reasonable to deal with. > - Concordantly, Is ZFS affected by a RAID card > that supports OCE? Or is this to no advantage? Don't bother. Spend the money on more RAM, and drives. :) Do get a nice controller though. Supermicro makes a few nice units. I'm using 2 AOC-USAS-L8i cards. They work great, though you do have to mod the mounting bracket to get them to work in a standard case. These are based on LSI cards, I just found them cheaper than the same LSI branded card. Avoid the cheap $20 4-port jobs. I've had a couple of them die already. Thankfully, I didn't lose any data... I think... no ZFS on that box. > - RAID5/6 with ZFS: As I understand it, ZFS with > raidz will provide the data/drive redundancy I seek > [home network, with maybe two simultaneous users on > at least a p...@1ghz/1Gb RAM storage server] so > obtaining a RAID controller card is > unnecessary/unhelpful. Yes? Correct. Though I would increase the RAM personally, it's so cheap these days. My home fileserver has 8GB of ECC RAM. I'm also running Xen VMs though, so some of my RAM is used for running those. You can even do tripple-redundant raidz with ZFS now, so you could lose 3 drives without any data loss. For those that want really high availability, or really big arrays I suppose. I'm running 4x1.5TB in a raidz1, no problems. I do plan to keep a spare around though. I'll just use it to store backups to start with. If a drive goes bad, I'll drop it in and do a zpool replace. Don't worry about the command line. The ZFS based commands are pretty short and simple. Read up on zpool and zfs. Those are the commands you use the most for managing ZFS. There's also the ZFS best practices guide if you haven't seen it. Useful advice in there. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool with very different sized vdevs?
I have a new array of 4x1.5TB drives running fine. I also have the old array of 4x400GB drives in the box on a separate pool for testing. I was planning to have the old drives just be a backup file store, so I could keep snapshots and such over there for important files. I was wondering if it makes any sense to add the older drives to the new pool. Reliability might be lower as they are older drives, so if I were to loose 2 of them, things could get ugly. I'm just curious if it would make any sense to do something like this. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] White box server for OpenSolaris
> I am after suggestions of motherboard, CPU and ram. > Basically I want ECC ram and at least two PCI-E x4 > channels. As I want to run 2 x AOC-USAS_L8i cards > for 16 drives. Asus M4N82 Deluxe. I have one running with 2 USAS-L8i cards just fine. I don't have all the drives loaded in yet, but the cards are detected and they can use the drives I do have attached. I currently have 8GB of ECC RAM on the board and it's working fine. The ECC options in the BIOS are enabled and it reports the ECC is enabled at boot. It has 3 PCIe x16 slots, I have a graphics card in the other slot, and an Intel e1000g card in the PCIe x1 slot. The onboard peripherals all work, with the exception of the onboard AHCI ports being buggy in b123 under xVM. Not sure what that's all about, I posted in the main discussion board but haven't heard if it's a known bug or if it will be fixed in the next version. It would be nice as my boot drives are on that controller. 2009.06 works fine though. CPU is a Phenom II X3 720. Probably overkill for fileserver duties, but I also want to do some VMs for other things, thus the bug I found with the xVM updates. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss