Hi Kandoi, It depends on: how many cores on each VNode how complicated of your analysis application
But I don't think it's normal spent 3hr to process 30GB data even on your *not good* hareware. On Tue, Dec 17, 2013 at 6:39 PM, Kandoi, Nikhil <[email protected]>wrote: > Hello everyone, > > > > I am new to Hadoop and would like to see if I’m on the right track. > > Currently I’m developing an application which would ingest logs of order > of 60-70 GB of data/day and would then do > > Some analysis on them > > Now the infrastructure that I have is a 4 node cluster( all nodes on > Virtual Machines) , all nodes have 4GB ram. > > > > But when I try to run the dataset (which is a sample dataset at this point > ) of about 30 GB, it takes about 3 hrs to process all of it. > > > > I would like to know is it normal for this kind of infrastructure to take > this amount of time. > > > > > > Thank you > > > > Nikhil Kandoi/ >
