Thanks stantley shi
On Tue, Apr 15, 2014 at 6:25 AM, Stanley Shi <[email protected]> wrote: > Rough estimation: since word count requires very little computation, it is > io centric, we can do estimation based on disk speed. > > Assume 10 disk with each 100MBps for each node, that is about 1GBps per > node; assume 70% utilization in mapper, we have 700MBps for each node. For > 30 nodes, it is total about 20GBps, so we need about 500 seconds for 10 TB > data. > Adding some map reduce overhead and the final merging, say 20% > overhead, we can expect about 10 minutes here. > > > On Tuesday, April 15, 2014, Shashidhar Rao <[email protected]> > wrote: > >> Hi, >> >> Can somebody provide me a rough estimate of the time taken in hours/mins >> for a cluster of say 30 nodes to run a map reduce job to perform a word >> count on say 10 TB of data, assuming that the hardware and the map reduce >> program is tuned optimally. >> >> Just a rough estimate, it could be 5TB,10 TB or 20 TB data. If not word >> count it could be just to analyze the above size of data. >> >> Regards >> Shashidhar >> > > > -- > Regards, > *Stanley Shi,* > > >
