Sorry Prabhu for hijacking this discussion a bit..  I wonder , what is the best 
practice to load the data in HDFS in general. Considering the size of the data 
( many times its in gbs or TBs generally),   how are storage  and time 
constraints handled.

If anybody  can share your experiences or best practice it would great!

-Shailesh.

From: Chen He [mailto:[email protected]]
Sent: Wednesday, September 05, 2012 7:34 PM
To: [email protected]
Subject: Re: One petabyte of data loading into HDFS with in 10 min.

If it is not a single file, you can upload them using multiple threads to HDFS.
On Wed, Sep 5, 2012 at 7:21 AM, prabhu K 
<[email protected]<mailto:[email protected]>> wrote:
Hi Users,

Please clarify the below questions.

1. With in 10 minutes one petabyte of data load into HDFS/HIVE , how many slave 
(Data Nodes) machines required.

2. With in 10 minutes one petabyte of data load into HDFS/HIVE, what is the 
configuration setup for cloud computing.

Please suggest and help me on this.

Thanks&Regards,
Prabhu.


Reply via email to