That's a tricky question, and it depends mostly on how do you plan to use 
Hadoop, more specifically what is your use case (for example, word count).
The answer should be divided for storage (disk) and computation limits (disk, 
CPU, and memory).
1. Disk-  if you are using the default File System and with default block size 
then you would need 20 TB space to store the input and there will be 
2*(10000000/128) = 156,250 blocks.
Afterward, it depends on your output size for the Map function (which will be 
deleted at the end of shuffle) and the Reduce function (which probably won't be 
the bottleneck).
If you believe that your Map output won't be larger than the input (the number 
of and size of the tuples), then I think around 40 TB would be enough.
2. Memory- if you want that the whole computation would be concurrent as 
possible then it depends on the amount of memory you specify for the containers 
(AM, Mappers, and Reducers) in the cluster configuration (yarn-site.xml), the 
number of containers and the use case demands (maybe each mapper should have at 
least 2056 MB). Otherwise, some of the containers would have to wait for space 
(formerly, the task assignment depends only on the Memory)
3. Cpu- the same as Memory but it could be irrelevant if it won't affect your 
container computation and assignment.
*When you do configure your cluster please pay attention also to the heap size.

Good luck 


On 2018/10/21 08:25:51, Amine Tengilimoglu <aminetengilimo...@gmail.com> wrote: 
> Hi all;
> 
>    I want to learn how can i estimate the hardware nedeed for hadoop
> cluster. is there any standart or other things?
> 
>   for example I have 10TB data, and i will analiyze it... My replication
> factor will be 2.
> 
>    How much  ram do i need for one node? how can I estimate it?
>    How much disk do i need for one node ? how can I estimate it?
>    How many  core - CPU do i need for one node?
> 
> 
> thanks in advance..
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Reply via email to