thank you for your explanantions. I work in a pseudo distributed mode and not in cluster. Does your recommendation are also available in this mode and how can i do to have an execution time increasing in function of the nbr of map reduces tasks, if it is possible. I don t understand in general how mapreduce is much performant in analysis then other systems like the datawarehouses. I have tested for example with hive a simple query "select sum(col1) from table1" and the resultts abtained with hive is in order of 10 min and with oracle is in the order of 0, 20 min for a size of dat ain the order of 40 MB.
Thank you. 2012/12/13 Mohammad Tariq <donta...@gmail.com> > Hello Imen, > > If you have huge no of tasks then the overhead of managing the map > and reduce task creation begins to dominate the total job execution time. > Also, more tasks means you need more free cpu slots. If the slots are not > free then the data block of interest will be moved to some other node where > frees lots are available and it will consume time and it is also against > the most basic principle of Hadoop i.e data localization. So, the no. of > maps and reduces should be raised keeping all the factors in mind, > otherwise you may face performance issues. > > HTH > > > Regards, > Mohammad Tariq > > > > On Thu, Dec 13, 2012 at 4:11 PM, Nitin Pawar <nitinpawar...@gmail.com>wrote: > >> If the number of maps or reducers your job launched are more than the >> jobqueue/cluster capacity, cpu time will increase >> On Dec 13, 2012 4:02 PM, "imen Megdiche" <imen.megdi...@gmail.com> wrote: >> >>> Hello, >>> >>> I am trying to increase the number of map and reduce tasks for a job and >>> even for the same data size, I noticed that the total time CPU increases but >>> I thought it would decrease. MapReduce is known for performance calculation, >>> but I do not see this when i do these small tests. >>> >>> What de you thins about this issue ? >>> >>> >