Since cost wasn't mentioned as a requirement...
An army of people mounting physical drives with the original dataset to
the cluster of machines and M/R copying from local disk would likely be
faster.
There are also 40Gbps Infiniband solutions available. Also, the
replication could be pushed to a separate network and would eventually
achieve consistency (presumably not required in 10mins) thus lowering
the primary connection bandwidth requirement to 1PB.
On 09/05/2012 09:43 AM, Cosmin Lehene wrote:
Here's an extremely naïve ballpark estimation: at theoretical hardware
speed, for 3PB representing 1PB with 3x replication
Over a single 1Gbps connection (and I'm not sure, you can actually
reach 1Gbps)
(3 petabytes) / (1 Gbps) = 291.271111 days
So you'd need at least 40,000 1Gbps network cards to get that in 10
minutes :) - (3PB/1Gbps)/40000
<http://www.google.ro/search?client=safari&rls=en&q=%283PB/1Gbps%29/40000&ie=UTF-8&oe=UTF-8&redir_esc=&ei=2WRHUNWtGIWo0QW52oDYDw>
The actual number of nodes would depend a lot on the actual network
architecture, the type of storage you use (SSD, HDD), etc.
Cosmin
From: prabhu K <[email protected] <mailto:[email protected]>>
Reply-To: "[email protected] <mailto:[email protected]>"
<[email protected] <mailto:[email protected]>>
Date: Wednesday, September 5, 2012 3:21 PM
To: "[email protected] <mailto:[email protected]>"
<[email protected] <mailto:[email protected]>>
Subject: One petabyte of data loading into HDFS with in 10 min.
Hi Users,
Please clarify the below questions.
1. With in 10 minutes one petabyte of data load into HDFS/HIVE , how
many slave (Data Nodes) machines required.
2. With in 10 minutes one petabyte of data load into HDFS/HIVE, what
is the configuration setup for cloud computing.
Please suggest and help me on this.
Thanks&Regards,
Prabhu.