Re: [lopsa-tech] large-scale disk

Doug Hughes Tue, 08 Mar 2011 10:50:22 -0800

On 3/8/2011 12:22 AM, Andrew Hume wrote:

while i am no stranger to large data, i have found myself outside mycomfort

zone at work. the department i have joined (i lateralled within research)

has traditionally used Sun as mid-range storage and Hitachi as theirhigh-end.


we now need to look at multi-PB disk systems (say 2-5PB), which i have
always thought of as a different market, with players like Parnasas.
as normal, we care rather less about IOPS and more about bandwidth
and $ per TB.

anyone (doug??) with comments pro or con for this market?

andrew

Isilon is worth looking at, as others have said. They have someinteresting technology in terms of rebuilds and scalability andbalancing that makes it so that the incremental purchase is buying a newbrick, but they do have some scalability limits that are based upon thesize units you purchase and the largest cluster that you can buy.Different people solve this in different ways by partitioning their datasets. They also have some newer features coming out soon with respect totiering.

You should also talk to DDN, with or without GPFS. They are cheaper thanIsilon buy a fair margin. with 2TB disks you can put ~1PB in a single19" rack. Their redundancy is good, and the price is very good for raw,block storage. You can use IB or FC for host connectivity and then youhave several options. You can use DDN simply for block storage and doyour own thing, you can put a traditional filesytem on it, or you canput a cluster filesystem on it. We use GPFS for the cluster filesystembecause it has a lot of very useful features in terms of data tieringthat make backups a lot easier, and finding files, and keeping certaintypes of files or directories or extensions or whatever onto differenttiers of storage. We put all of our metadata on a TMS to make searchingand migrates much faster, but this may not be a concern for you if youjust need bulk storage. the DDN also supports MAID if you have olderdata sets that aren't accessed much. You can tier the data there on aschedule and then put the disks into low power mode. GPFS is fast andworks pretty well for cluster access allowing horizontal scaling, butwe've been having a very uneven time of reliability with it. If one headhas a GPFS hard lock-up, then NFS failover from that head will not workto another head. That's a failing that we're trying to get addressed andhave had too many of them, but we don't have problems with it keeping upwith read or write load under normal circumstances. (with TMS formetadata we can search for metadata attributes of ~300M files in about10 minutes). DDN also does parity verification on read and doesscrubbing to keep phantom bit flips to a minimum and weed out bad data.(10 disk stripes, 2 disk parity, across 10 storage controllers with dualconnections)

Filetek is a very interesting archive product that we've been startingto use lately. You can basically put whatever you want behind filetek.Some crazy people put filetek in front and then put Netapps behind.Filetek is basically a virtual filesystem with all of the metadata in adatabase for "infinite" scalability and with a policy engine behind.They help you setup according to your needs. Traditionally it issomething like 2 copies on 2 tapes of every file and maybe an archive,but also with a performance buffer. So, you could also put Filetek infront of a DDN, for instance, and use it like a big filesystem with avirtual or real tape as a backend. It has checksums on every file andcan perform audit operations as well, which makes it very useful fordata integrity guarantees. When filetek reads something that doesn'tpass the checksum, it will invalidate that copy and pull another one ifyou have 2 (say 2 tapes) and then make another copy to keep your minimumcopies guarantee per file.

I believe Isilon only recently data verification on read (or maybe isadding soon?)

I would not talk to NetApp here, contrary to what othes have said.WAAAAAY too much money.

One final thing to look at would be Nexenta. It's basically andOpenSolaris kernel (zfs, dtrace, deduplication, integrity checksums,etc.) with a Debian Linux user space. You can put Nexenta on whateverhardware you want. We're evalutating on some Supermicro hardware. Theywill support this and will charge based upon #TB of storage for support.You can get < ~20TB for nothing (support yourself). The big win is ZFSintegrity checking, and inexpensive disk. Combine this with somereasonable flash drives for ZIL (X-25E, OCZ, RevoDrive-x2, etc) andyou've got a pretty darn fast and inexpensive large block storagefilesystem with snapshots and a really good backup story. We'reevaluating nexenta on some supermicro boxes that have 36x2TB drives inthe main chassis (4u) and 47x2TB drives in an expansion chassis (also4U) connected with 6Gbit SAS. That's ~160TB disk in 8U. You need tocheck the cooling angle since half the disks are in the hot aisle. Inour place, it shouldn't be a problem. We're using the revodrive x2 PCIcard internal to the Supermicro boxes as the boot/ZIL device and the 2TBas data. Setup is still in progress. If you plan to buy a lot of these Ican put you in touch with our cluster vendor who will build them to specand charge a small markup for assembly and hardware support/RMA.

The guys at Berkeley Communications will sell Supermicro Boxes withNexenta or OpenSolaris Indiana for some support money if you wantintegration help for a reasonable cost. They sell a lot of these thingsto oil and gas exploration companies and do clustering as well.

Raid Inc. also sells the same Supermicro boxes I talked about off theshelf for a very low markup for support and integration (probably worthit to not have to worry about the rev of motherboard, etc.).

One other thing that has a lot of interesting promise is Ibrix which HPbought lost year and put a lot of development effort into. it's veryinteresting technology with some pretty cool usage modes and a goodstory on scalability. They use independent nodes like Isilon and thenyou supply a layout policy on top of them, but the Ibrix client isreally good about smart pre-fetch and knows which back-end to talk tobecause of the distributed metadata. HP also seems to have somelarge-scale storage options, but I haven't checked the price.

Lastly, IBM seems to be cost-competitive with Isilon and will give youSOFS (GPFS++) with either an IBM storage back-end or a DDN storageback-end. They are more expensive than buying the DDN directly, but youprobably get a better GPFS support option for scalability and vendor bugsupport. With DDN and GPFS, everything is intermediary with GPFS issues,and escalation can take a very long time. (2-3x cost)

_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Re: [lopsa-tech] large-scale disk

Reply via email to