I know this foolish of me to ask this, because there are a lot of factors that affect this, but why is it taking so much time, can anyone suggest possible reasons for it, or if anyone has faced such issue before
Thanks, Nikhil Kandoi P.S - I am Hadoop-1.0.3 for this application, so I wonder if this version has got something to do with it. From: Azuryy Yu [mailto:[email protected]] Sent: Tuesday, December 17, 2013 4:14 PM To: [email protected] Subject: Re: Estimating the time of my hadoop jobs Hi Kandoi, It depends on: how many cores on each VNode how complicated of your analysis application But I don't think it's normal spent 3hr to process 30GB data even on your *not good* hareware. On Tue, Dec 17, 2013 at 6:39 PM, Kandoi, Nikhil <[email protected]<mailto:[email protected]>> wrote: Hello everyone, I am new to Hadoop and would like to see if I'm on the right track. Currently I'm developing an application which would ingest logs of order of 60-70 GB of data/day and would then do Some analysis on them Now the infrastructure that I have is a 4 node cluster( all nodes on Virtual Machines) , all nodes have 4GB ram. But when I try to run the dataset (which is a sample dataset at this point ) of about 30 GB, it takes about 3 hrs to process all of it. I would like to know is it normal for this kind of infrastructure to take this amount of time. Thank you Nikhil Kandoi/
