Re: Hadoop noob question

maisnam ns Sat, 11 May 2013 04:08:53 -0700

@Nitin Pawar , thanks for clearing my doubts .

But I have one more question , say I have 10 TB data in the pipeline .


Is it perfectly OK to use hadopo fs put command to upload these files of
size 10 TB and is there any limit to the file size  using hadoop command
line . Can hadoop put command line work with huge data.

Thanks in advance


On Sat, May 11, 2013 at 4:24 PM, Nitin Pawar <[email protected]>wrote:

> first of all .. most of the companies do not get 100 PB of data in one go.
> Its an accumulating process and most of the companies do have a data
> pipeline in place where the data is written to hdfs on a frequency basis
> and  then its retained on hdfs for some duration as per needed and from
> there its sent to archivers or deleted.
>
> For data management products, you can look at falcon which is open sourced
> by inmobi along with hortonworks.
>
> In any case, if you want to write files to hdfs there are few options
> available to you
> 1) Write your dfs client which writes to dfs
> 2) use hdfs proxy
> 3) there is webhdfs
> 4) command line hdfs
> 5) data collection tools come with support to write to hdfs like flume etc
>
>
> On Sat, May 11, 2013 at 4:19 PM, Thoihen Maibam <[email protected]>wrote:
>
>> Hi All,
>>
>> Can anyone help me know how does companies like Facebook ,Yahoo etc
>> upload bulk files say to the tune of 100 petabytes to Hadoop HDFS cluster
>> for processing
>> and after processing how they download those files from HDFS to local
>> file system.
>>
>> I don't think they might be using the command line hadoop fs put to
>> upload files as it would take too long or do they divide say 10 parts each
>> 10 petabytes and  compress and use the command line hadoop fs put
>>
>> Or if they use any tool to upload huge files.
>>
>> Please help me .
>>
>> Thanks
>> thoihen
>>
>
>
>
> --
> Nitin Pawar
>

Re: Hadoop noob question

Reply via email to