On 3/8/2011 12:22 AM, Andrew Hume wrote:
while i am no stranger to large data, i have found myself outside my comfort
zone at work. the department i have joined (i lateralled within research)
has traditionally used Sun as mid-range storage and Hitachi as their high-end.

we now need to look at multi-PB disk systems (say 2-5PB), which i have
always thought of as a different market, with players like Parnasas.
as normal, we care rather less about IOPS and more about bandwidth
and $ per TB.

anyone (doug??) with comments pro or con for this market?

andrew

Isilon is worth looking at, as others have said. They have some interesting technology in terms of rebuilds and scalability and balancing that makes it so that the incremental purchase is buying a new brick, but they do have some scalability limits that are based upon the size units you purchase and the largest cluster that you can buy. Different people solve this in different ways by partitioning their data sets. They also have some newer features coming out soon with respect to tiering.

You should also talk to DDN, with or without GPFS. They are cheaper than Isilon buy a fair margin. with 2TB disks you can put ~1PB in a single 19" rack. Their redundancy is good, and the price is very good for raw, block storage. You can use IB or FC for host connectivity and then you have several options. You can use DDN simply for block storage and do your own thing, you can put a traditional filesytem on it, or you can put a cluster filesystem on it. We use GPFS for the cluster filesystem because it has a lot of very useful features in terms of data tiering that make backups a lot easier, and finding files, and keeping certain types of files or directories or extensions or whatever onto different tiers of storage. We put all of our metadata on a TMS to make searching and migrates much faster, but this may not be a concern for you if you just need bulk storage. the DDN also supports MAID if you have older data sets that aren't accessed much. You can tier the data there on a schedule and then put the disks into low power mode. GPFS is fast and works pretty well for cluster access allowing horizontal scaling, but we've been having a very uneven time of reliability with it. If one head has a GPFS hard lock-up, then NFS failover from that head will not work to another head. That's a failing that we're trying to get addressed and have had too many of them, but we don't have problems with it keeping up with read or write load under normal circumstances. (with TMS for metadata we can search for metadata attributes of ~300M files in about 10 minutes). DDN also does parity verification on read and does scrubbing to keep phantom bit flips to a minimum and weed out bad data. (10 disk stripes, 2 disk parity, across 10 storage controllers with dual connections)

Filetek is a very interesting archive product that we've been starting to use lately. You can basically put whatever you want behind filetek. Some crazy people put filetek in front and then put Netapps behind. Filetek is basically a virtual filesystem with all of the metadata in a database for "infinite" scalability and with a policy engine behind. They help you setup according to your needs. Traditionally it is something like 2 copies on 2 tapes of every file and maybe an archive, but also with a performance buffer. So, you could also put Filetek in front of a DDN, for instance, and use it like a big filesystem with a virtual or real tape as a backend. It has checksums on every file and can perform audit operations as well, which makes it very useful for data integrity guarantees. When filetek reads something that doesn't pass the checksum, it will invalidate that copy and pull another one if you have 2 (say 2 tapes) and then make another copy to keep your minimum copies guarantee per file.

I believe Isilon only recently data verification on read (or maybe is adding soon?)

I would not talk to NetApp here, contrary to what othes have said. WAAAAAY too much money.

One final thing to look at would be Nexenta. It's basically and OpenSolaris kernel (zfs, dtrace, deduplication, integrity checksums, etc.) with a Debian Linux user space. You can put Nexenta on whatever hardware you want. We're evalutating on some Supermicro hardware. They will support this and will charge based upon #TB of storage for support. You can get < ~20TB for nothing (support yourself). The big win is ZFS integrity checking, and inexpensive disk. Combine this with some reasonable flash drives for ZIL (X-25E, OCZ, RevoDrive-x2, etc) and you've got a pretty darn fast and inexpensive large block storage filesystem with snapshots and a really good backup story. We're evaluating nexenta on some supermicro boxes that have 36x2TB drives in the main chassis (4u) and 47x2TB drives in an expansion chassis (also 4U) connected with 6Gbit SAS. That's ~160TB disk in 8U. You need to check the cooling angle since half the disks are in the hot aisle. In our place, it shouldn't be a problem. We're using the revodrive x2 PCI card internal to the Supermicro boxes as the boot/ZIL device and the 2TB as data. Setup is still in progress. If you plan to buy a lot of these I can put you in touch with our cluster vendor who will build them to spec and charge a small markup for assembly and hardware support/RMA.

The guys at Berkeley Communications will sell Supermicro Boxes with Nexenta or OpenSolaris Indiana for some support money if you want integration help for a reasonable cost. They sell a lot of these things to oil and gas exploration companies and do clustering as well.

Raid Inc. also sells the same Supermicro boxes I talked about off the shelf for a very low markup for support and integration (probably worth it to not have to worry about the rev of motherboard, etc.).

One other thing that has a lot of interesting promise is Ibrix which HP bought lost year and put a lot of development effort into. it's very interesting technology with some pretty cool usage modes and a good story on scalability. They use independent nodes like Isilon and then you supply a layout policy on top of them, but the Ibrix client is really good about smart pre-fetch and knows which back-end to talk to because of the distributed metadata. HP also seems to have some large-scale storage options, but I haven't checked the price.

Lastly, IBM seems to be cost-competitive with Isilon and will give you SOFS (GPFS++) with either an IBM storage back-end or a DDN storage back-end. They are more expensive than buying the DDN directly, but you probably get a better GPFS support option for scalability and vendor bug support. With DDN and GPFS, everything is intermediary with GPFS issues, and escalation can take a very long time. (2-3x cost)



_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to