thank you for your explanantions. I  work in a pseudo distributed mode and
not in cluster. Does your recommendation are also available  in this mode
and how can i do to have an execution time increasing in function of the
nbr of map reduces tasks, if it is possible.
I don t understand in general how mapreduce is much performant in analysis
then other systems like the datawarehouses. I have tested for example with
hive a simple query "select sum(col1) from table1" and the resultts
abtained with hive is in order of 10 min  and with oracle is in the order
of 0, 20 min for a size of dat ain the order of 40 MB.

Thank you.


2012/12/13 Mohammad Tariq <donta...@gmail.com>

> Hello Imen,
>
>       If you have huge no of tasks then the overhead of managing the map
> and reduce task creation begins to dominate the total job execution time.
> Also, more tasks means you need more free cpu slots. If the slots are not
> free then the data block of interest will be moved to some other node where
> frees lots are available and it will consume time and it is also against
> the most basic principle of Hadoop i.e data localization. So, the no. of
> maps and reduces should be raised keeping all the factors in mind,
> otherwise you may face performance issues.
>
> HTH
>
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Thu, Dec 13, 2012 at 4:11 PM, Nitin Pawar <nitinpawar...@gmail.com>wrote:
>
>> If the number of maps or reducers your job launched are more than the
>> jobqueue/cluster capacity, cpu time will increase
>> On Dec 13, 2012 4:02 PM, "imen Megdiche" <imen.megdi...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I am trying to increase the number of map and reduce tasks for a job and
>>> even for the same data size, I noticed that the total time CPU increases but
>>> I thought it would decrease. MapReduce is known for performance calculation,
>>> but I do not see this when i  do these small tests.
>>>
>>> What de you thins about this issue ?
>>>
>>>
>

Reply via email to