Morning all, I've recently been trying to get zfs into deployment at work, a place which is awash with Dell hardware. My boss purchased a Dell MD1000 (a 15 disk SAS enclosure) fully populated with 500gig SATA disks for use as a large dump space for several groups here.
I originally had it hooked up to a v880 using an LSI Logic SAS3442X (supported by mpt or LSI's itmpt driver), and ran zfs directly on the disks. That worked ok for a while, but the disks themselves would randomly freak out and reset, losing io and doing strange things to the bus expanders in the enclosure. I don't think they liked the frequent cache syncs that zfs likes to do. Since I couldn't get it to work reliably I looked at putting Solaris onto a Dell 2850 and using a PERC5/E to talk to the disks, since that controller and those disks worked very well under a variety of loads. The PERC5/E can be considered a re-badged LSI Logic MegaRAID SAS controller. Unfortunately I couldn't find a Solaris driver for that controller, either from Dell, LSI Logic, or Sun. So a few weeks ago I sat down and wrote a driver for it called mfi (short for MegaRAID Firmware Interface). The driver is available from a subversion repository up at https://svn.itee.uq.edu.au/repo/mfi. Please be aware that I am very new to working in Solaris, especially with your build tools and in the kernel. So if any of it looks like it could be improved (especially the Makefile, I know it is awful) I would be more than happy to take patches to fix it. Anyway, the driver seems to work very well, except there is a bug in it somewhere which I can't find. When it is working, it works extremely well. The controller itself emulates SCSI for all accesses to the logical disks, so the driver simply has to pass the scsi commands from the midlayer to the card. For those of you who like looking at stuff showing it in action, here's the some output: prtconf -D chunk: pci8086,370, instance #8 (driver name: pcie_pci) pci1028,1f01, instance #0 (driver name: mfi) sd, instance #4 (driver name: sd) format output: 1. c3t0d0 <DELL-PERC 5/E Adapter-1.00-5.45TB> /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED]/pci1028,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 format> inq Vendor: DELL Product: PERC 5/E Adapter Revision: 1.00 zfs bits: [EMAIL PROTECTED] mfi$ zpool status -v pool: perc state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM perc ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 errors: No known data errors [EMAIL PROTECTED] mfi$ zfs list NAME USED AVAIL REFER MOUNTPOINT perc 9.98G 5.34T 27.5K /perc A zfs scrub on the volume shows the disk doing about 500meg a second in reads. I haven't done any more benchmarking since I think it is pointless till the driver is reliable. Anyway, back to the bug. At a random point the controller decides not to accept one or two commands, and IO simply blocks from that point on because the commands never complete. I have no idea why this happens, and I'm finding it really hard to debug. This is mostly because I have no idea what I'm doing when trying to debug on Solaris. My other problem is that it takes AGES for the machine to reboot, so adding little chunks of debug is too time consuming. So I have two purposes with this email: 1. To get help with debugging this. I need help. 2. To put the driver out there for people to use. I know it is really frustrating to have some fun hardware and not being able to use it with some fun software. There are a couple of people who I would like to thank for their help so far with this, but I'm not sure if they want to be publicly acknowledged. You guys know who you are, so thank you for putting up with me. If anyone has any questions or suggestions for me, please feel free to contact me. Remember, the code is up at https://svn.itee.uq.edu.au/repo/mfi/. Regards, dlg _______________________________________________ storage-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/storage-discuss
