Morning all,

I've recently been trying to get zfs into deployment at work, a place
which is awash with Dell hardware. My boss purchased a Dell MD1000 (a
15 disk SAS enclosure) fully populated with 500gig SATA disks for use as
a large dump space for several groups here.

I originally had it hooked up to a v880 using an LSI Logic SAS3442X
(supported by mpt or LSI's itmpt driver), and ran zfs directly on the
disks. That worked ok for a while, but the disks themselves would
randomly freak out and reset, losing io and doing strange things to the
bus expanders in the enclosure. I don't think they liked the frequent
cache syncs that zfs likes to do. Since I couldn't get it to work
reliably I looked at putting Solaris onto a Dell 2850 and using a
PERC5/E to talk to the disks, since that controller and those disks
worked very well under a variety of loads. The PERC5/E can be considered
a re-badged LSI Logic MegaRAID SAS controller.

Unfortunately I couldn't find a Solaris driver for that controller,
either from Dell, LSI Logic, or Sun. So a few weeks ago I sat down and
wrote a driver for it called mfi (short for MegaRAID Firmware
Interface).

The driver is available from a subversion repository up at
https://svn.itee.uq.edu.au/repo/mfi. Please be aware that I am very new
to working in Solaris, especially with your build tools and in the
kernel. So if any of it looks like it could be improved (especially the
Makefile, I know it is awful) I would be more than happy to take patches
to fix it.

Anyway, the driver seems to work very well, except there is a bug in it
somewhere which I can't find.

When it is working, it works extremely well. The controller itself
emulates SCSI for all accesses to the logical disks, so the driver
simply has to pass the scsi commands from the midlayer to the card. For
those of you who like looking at stuff showing it in action, here's the
some output:

prtconf -D chunk:

            pci8086,370, instance #8 (driver name: pcie_pci)
                pci1028,1f01, instance #0 (driver name: mfi)
                    sd, instance #4 (driver name: sd)

format output:

       1. c3t0d0 <DELL-PERC 5/E Adapter-1.00-5.45TB>
          /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL 
PROTECTED]/pci1028,[EMAIL PROTECTED]/[EMAIL PROTECTED],0

format> inq
Vendor:   DELL    
Product:  PERC 5/E Adapter
Revision: 1.00

zfs bits:

[EMAIL PROTECTED] mfi$ zpool status -v
  pool: perc
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        perc        ONLINE       0     0     0
          c3t0d0    ONLINE       0     0     0

errors: No known data errors
[EMAIL PROTECTED] mfi$ zfs list
NAME                   USED  AVAIL  REFER  MOUNTPOINT
perc                  9.98G  5.34T  27.5K  /perc

A zfs scrub on the volume shows the disk doing about 500meg a second
in reads. I haven't done any more benchmarking since I think it is
pointless till the driver is reliable.

Anyway, back to the bug.

At a random point the controller decides not to accept one or two
commands, and IO simply blocks from that point on because the commands
never complete. I have no idea why this happens, and I'm finding it
really hard to debug. This is mostly because I have no idea what I'm
doing when trying to debug on Solaris. My other problem is that it takes
AGES for the machine to reboot, so adding little chunks of debug is too
time consuming.

So I have two purposes with this email:

1. To get help with debugging this. I need help.
2. To put the driver out there for people to use. I know it is really
   frustrating to have some fun hardware and not being able to use it
   with some fun software.

There are a couple of people who I would like to thank for their help so
far with this, but I'm not sure if they want to be publicly acknowledged.
You guys know who you are, so thank you for putting up with me.

If anyone has any questions or suggestions for me, please feel free to
contact me.

Remember, the code is up at https://svn.itee.uq.edu.au/repo/mfi/.

Regards,
dlg
_______________________________________________
storage-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/storage-discuss

Reply via email to