@Nitin Pawar , thanks for clearing my doubts . But I have one more question , say I have 10 TB data in the pipeline .
Is it perfectly OK to use hadopo fs put command to upload these files of size 10 TB and is there any limit to the file size using hadoop command line . Can hadoop put command line work with huge data. Thanks in advance On Sat, May 11, 2013 at 4:24 PM, Nitin Pawar <[email protected]>wrote: > first of all .. most of the companies do not get 100 PB of data in one go. > Its an accumulating process and most of the companies do have a data > pipeline in place where the data is written to hdfs on a frequency basis > and then its retained on hdfs for some duration as per needed and from > there its sent to archivers or deleted. > > For data management products, you can look at falcon which is open sourced > by inmobi along with hortonworks. > > In any case, if you want to write files to hdfs there are few options > available to you > 1) Write your dfs client which writes to dfs > 2) use hdfs proxy > 3) there is webhdfs > 4) command line hdfs > 5) data collection tools come with support to write to hdfs like flume etc > > > On Sat, May 11, 2013 at 4:19 PM, Thoihen Maibam <[email protected]>wrote: > >> Hi All, >> >> Can anyone help me know how does companies like Facebook ,Yahoo etc >> upload bulk files say to the tune of 100 petabytes to Hadoop HDFS cluster >> for processing >> and after processing how they download those files from HDFS to local >> file system. >> >> I don't think they might be using the command line hadoop fs put to >> upload files as it would take too long or do they divide say 10 parts each >> 10 petabytes and compress and use the command line hadoop fs put >> >> Or if they use any tool to upload huge files. >> >> Please help me . >> >> Thanks >> thoihen >> > > > > -- > Nitin Pawar >
