I know this foolish of me to ask this, because there are a lot of factors that 
affect this,
but why is it taking so much time, can anyone suggest possible reasons for it, 
or if anyone has faced such issue before

Thanks,
Nikhil Kandoi
P.S - I am  Hadoop-1.0.3  for this application, so I wonder if this version has 
got something to do with it.

From: Azuryy Yu [mailto:[email protected]]
Sent: Tuesday, December 17, 2013 4:14 PM
To: [email protected]
Subject: Re: Estimating the time of my hadoop jobs

Hi Kandoi,
It depends on:
how many cores on each VNode
how complicated of your analysis application

But I don't think it's normal spent 3hr to process 30GB data even on your *not 
good* hareware.





On Tue, Dec 17, 2013 at 6:39 PM, Kandoi, Nikhil 
<[email protected]<mailto:[email protected]>> wrote:
Hello everyone,

I am new to Hadoop and would like to see if I'm on the right track.
Currently I'm developing an application which would ingest logs of order of 
60-70 GB of data/day and would then do
Some analysis on them
Now the infrastructure that I have is a 4 node cluster( all nodes on Virtual 
Machines) , all nodes have 4GB ram.

But when I try to run the dataset (which is a sample dataset at this point ) of 
about 30 GB, it takes about 3 hrs to process all of it.

I would like to know is it normal for this kind of infrastructure to take this 
amount of time.


Thank you

Nikhil Kandoi/

Reply via email to