Sorry Prabhu for hijacking this discussion a bit.. I wonder , what is the best practice to load the data in HDFS in general. Considering the size of the data ( many times its in gbs or TBs generally), how are storage and time constraints handled.
If anybody can share your experiences or best practice it would great! -Shailesh. From: Chen He [mailto:[email protected]] Sent: Wednesday, September 05, 2012 7:34 PM To: [email protected] Subject: Re: One petabyte of data loading into HDFS with in 10 min. If it is not a single file, you can upload them using multiple threads to HDFS. On Wed, Sep 5, 2012 at 7:21 AM, prabhu K <[email protected]<mailto:[email protected]>> wrote: Hi Users, Please clarify the below questions. 1. With in 10 minutes one petabyte of data load into HDFS/HIVE , how many slave (Data Nodes) machines required. 2. With in 10 minutes one petabyte of data load into HDFS/HIVE, what is the configuration setup for cloud computing. Please suggest and help me on this. Thanks&Regards, Prabhu.
