Ryan Byrd wrote:
Let's say I need to store a petabyte of data. I need fast access (tape/DVDs
aren't fast enough) and redundancy like RAID.
By "fast access" do you mean both read and write, mainly read (once it's
written), or...? What RAID level would you run (this will impact both your access speed
and the raw size of the storage you need).
How should I do it? Initially the data will be in large (several MB) binary
objects and could be stored as files, but eventually, it will need to be
placed into a relational database like Oracle.
How will the data be accessed? Are the "several MB binary objects" to be read
as chunks or streamed, accessed at random locations, indexed, searched?
Let's say I have access to racks that are 44U tall. I've listed several
very
different scenarios. Which is best? Is there a better way? How would *you*
store 1PB? What do you/your company currently use to store your large
datasets?
Large SANs.
What kind of drives should I use? Do SCSI drives last longer than
SATA?
Depends on who you ask, I doubt you're going to find an enterprise-class
storage system running on SATA drives.
What about IDE or SAS?
SAS has probably already surpassed regular old SCSI.
Do higher RPM drives have a shorter mean time to failure?
Perhaps, but that's going to be the least of your worries... With the
thousands of drives required to make up your 1PB, you'll be lucky if only a few
fail before even spinning up.
How will the data be sucked off the disk? Several large servers? A large farm
of clients? You won't want to connect it all to a single machine, so...where
are your real bottlenecks?
Do you need some of the enterprise features such as snapshotting, etc?
Redundancy (not just disk, but connectivity, power feeds and supplies, etc.)?
Scenario 1:
----------------
EMC CLARiiON CX300 PSI w/ Fiber channel
14x300GB ultrawide SCSI
Sounds like small potatoes. If you're going to get something from EMC, why not
something larger than the CX300? DMX series perhaps
(http://www.emc.com/products/systems/dmx_compare.jsp)? Rather than 33 racks full of
tiny boxes at only 31TB per rack, there are larger boxes (storage-wise) that have a
smaller footprint. In fact <VENDOR REDACTED> will be coming out with a very
nice rackmount storage box that...er...nevermind...I can't talk about that.
Scenario 2:
----------------
Dell PowerVault MD3000
And connect them all to what? Dell jokes aside, is this really what you'd want?
Scenario 3 & 4:
----------------
HP ProLiant DL* Server
Yes, it'd be a lot cheaper starting out, but what about the maintenance associated with an operating system, etc.
Do you have lots of datacenter space that you really need to use up, or does
physical size matter (I mean, you're talking about solutions that take up
nearly 50 racks!!!)? What are your power and cooling capacities and costs in
your datacenter? With that much hardware, your Environmentals are going to
become a factor. What are your maintenance and support costs?
I suspect you don't already have 1PB of data to fill up all this storage right
away, so can you use a phased approach that allows you to get some storage now,
more later on, etc.? Take advantage of falling costs, improving technology,
and what the vendor you choose has on their roadmap as you fill out the entire
1PB. Is the 1PB expected to grow? Where is the data coming from, and how
quickly will it be generated, etc.?
Lots more questions before you'll be able to really decide on a solution, but
with that kind of change floating around, you'll need to define your
requirements a lot more. I suggest you spend the time to define your
requirements, then request proposals from several vendors. With the
appropriate reciprocal NDAs, and proposals from vendors, you should be able to
make a more informed decision.
If you have additional questions, or need help contacting a vendor, let me
know. My finder's fee and consulting costs are quite reasonable ;)
Frank
/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/