On 3/8/2011 12:22 AM, Andrew Hume wrote:
while i am no stranger to large data, i have found myself outside my
comfort
zone at work. the department i have joined (i lateralled within research)
has traditionally used Sun as mid-range storage and Hitachi as their
high-end.
we now need to look at multi-PB disk systems (say 2-5PB), which i have
always thought of as a different market, with players like Parnasas.
as normal, we care rather less about IOPS and more about bandwidth
and $ per TB.
anyone (doug??) with comments pro or con for this market?
andrew
Isilon is worth looking at, as others have said. They have some
interesting technology in terms of rebuilds and scalability and
balancing that makes it so that the incremental purchase is buying a new
brick, but they do have some scalability limits that are based upon the
size units you purchase and the largest cluster that you can buy.
Different people solve this in different ways by partitioning their data
sets. They also have some newer features coming out soon with respect to
tiering.
You should also talk to DDN, with or without GPFS. They are cheaper than
Isilon buy a fair margin. with 2TB disks you can put ~1PB in a single
19" rack. Their redundancy is good, and the price is very good for raw,
block storage. You can use IB or FC for host connectivity and then you
have several options. You can use DDN simply for block storage and do
your own thing, you can put a traditional filesytem on it, or you can
put a cluster filesystem on it. We use GPFS for the cluster filesystem
because it has a lot of very useful features in terms of data tiering
that make backups a lot easier, and finding files, and keeping certain
types of files or directories or extensions or whatever onto different
tiers of storage. We put all of our metadata on a TMS to make searching
and migrates much faster, but this may not be a concern for you if you
just need bulk storage. the DDN also supports MAID if you have older
data sets that aren't accessed much. You can tier the data there on a
schedule and then put the disks into low power mode. GPFS is fast and
works pretty well for cluster access allowing horizontal scaling, but
we've been having a very uneven time of reliability with it. If one head
has a GPFS hard lock-up, then NFS failover from that head will not work
to another head. That's a failing that we're trying to get addressed and
have had too many of them, but we don't have problems with it keeping up
with read or write load under normal circumstances. (with TMS for
metadata we can search for metadata attributes of ~300M files in about
10 minutes). DDN also does parity verification on read and does
scrubbing to keep phantom bit flips to a minimum and weed out bad data.
(10 disk stripes, 2 disk parity, across 10 storage controllers with dual
connections)
Filetek is a very interesting archive product that we've been starting
to use lately. You can basically put whatever you want behind filetek.
Some crazy people put filetek in front and then put Netapps behind.
Filetek is basically a virtual filesystem with all of the metadata in a
database for "infinite" scalability and with a policy engine behind.
They help you setup according to your needs. Traditionally it is
something like 2 copies on 2 tapes of every file and maybe an archive,
but also with a performance buffer. So, you could also put Filetek in
front of a DDN, for instance, and use it like a big filesystem with a
virtual or real tape as a backend. It has checksums on every file and
can perform audit operations as well, which makes it very useful for
data integrity guarantees. When filetek reads something that doesn't
pass the checksum, it will invalidate that copy and pull another one if
you have 2 (say 2 tapes) and then make another copy to keep your minimum
copies guarantee per file.
I believe Isilon only recently data verification on read (or maybe is
adding soon?)
I would not talk to NetApp here, contrary to what othes have said.
WAAAAAY too much money.
One final thing to look at would be Nexenta. It's basically and
OpenSolaris kernel (zfs, dtrace, deduplication, integrity checksums,
etc.) with a Debian Linux user space. You can put Nexenta on whatever
hardware you want. We're evalutating on some Supermicro hardware. They
will support this and will charge based upon #TB of storage for support.
You can get < ~20TB for nothing (support yourself). The big win is ZFS
integrity checking, and inexpensive disk. Combine this with some
reasonable flash drives for ZIL (X-25E, OCZ, RevoDrive-x2, etc) and
you've got a pretty darn fast and inexpensive large block storage
filesystem with snapshots and a really good backup story. We're
evaluating nexenta on some supermicro boxes that have 36x2TB drives in
the main chassis (4u) and 47x2TB drives in an expansion chassis (also
4U) connected with 6Gbit SAS. That's ~160TB disk in 8U. You need to
check the cooling angle since half the disks are in the hot aisle. In
our place, it shouldn't be a problem. We're using the revodrive x2 PCI
card internal to the Supermicro boxes as the boot/ZIL device and the 2TB
as data. Setup is still in progress. If you plan to buy a lot of these I
can put you in touch with our cluster vendor who will build them to spec
and charge a small markup for assembly and hardware support/RMA.
The guys at Berkeley Communications will sell Supermicro Boxes with
Nexenta or OpenSolaris Indiana for some support money if you want
integration help for a reasonable cost. They sell a lot of these things
to oil and gas exploration companies and do clustering as well.
Raid Inc. also sells the same Supermicro boxes I talked about off the
shelf for a very low markup for support and integration (probably worth
it to not have to worry about the rev of motherboard, etc.).
One other thing that has a lot of interesting promise is Ibrix which HP
bought lost year and put a lot of development effort into. it's very
interesting technology with some pretty cool usage modes and a good
story on scalability. They use independent nodes like Isilon and then
you supply a layout policy on top of them, but the Ibrix client is
really good about smart pre-fetch and knows which back-end to talk to
because of the distributed metadata. HP also seems to have some
large-scale storage options, but I haven't checked the price.
Lastly, IBM seems to be cost-competitive with Isilon and will give you
SOFS (GPFS++) with either an IBM storage back-end or a DDN storage
back-end. They are more expensive than buying the DDN directly, but you
probably get a better GPFS support option for scalability and vendor bug
support. With DDN and GPFS, everything is intermediary with GPFS issues,
and escalation can take a very long time. (2-3x cost)
_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/