Some further developments: I noticed that in addition to bios updates, there was a separate firmware update for the marvell SE9230 that is one of the secondary controllers on the motherboard. So I grabbed this and updated everything.
Well, it certainly made a difference: a scrub hung within about 30s of starting. Not quite the improvement I had been hoping for. However, it pointed the finger of suspicion at this controller in particular. I rearranged all the disks around, so that none of the ones from the large pool were on this controller. In this configuration, the pool scrubs fine. It also does so notably faster (perhaps by 25-30%), despite 4 of the disks now being on slower 3gbit ports. The ssd pool wound up on that controller instead, in this rearrangement. I tried scrubbing that and it was very, very slow. So this controller is starting to smell pretty bad. I've been looking around at google results and it does seem to have a collection of issues reported, with people getting highly variable results based on config options. There was an interesting linux kernel quirk added with iommu enabled, because it uses an undeclared second pci function id. I now have a collection of firmware versions and option settings to try. Even if the controller is doing something wrong, it seems we're still losing track of commands without reporting errors or warnings on the device. I've uploaded a crash dump in case there are any clues there. ------------------------------------------- smartos-discuss Archives: https://www.listbox.com/member/archive/184463/=now RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00 Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb Powered by Listbox: http://www.listbox.com