Thanks for your reply Raymond.

Just for my comprehension: You mean that a >single< reduce phase isn't possible to parallelise? So I guess the problem in my case is that there is only one map and reduce process on a local machine? In other words: In order to process the work with parallel reduce processes it would be necessary to run multiple map processes before.

I think my problem with that topic is, that I just don't know what exactly happens in the map / reduce phase.
Know a good link to get me informed? :)

Cheers,

Marek

On 10.06.2011 15:57, MilleBii wrote:
Hadoop is using a map/reduce algorithm, the reduce phase is that phase which
collects the results from // execution.
It is inherently not possible to parrallelized that phase.

-Raymond-

2011/6/10 Marek Bachmann<[email protected]>

Hello again,

I noticed that in the reduce phase only use one cpu core. This processes
take very long time with 100 % usage but only on one core. Is there a
possibility to parallelise this processes on multiple cores on one local
machine? Could using Hadoop help in some way? I have no experience with
Hadoop at all. :-/

11/06/10 14:38:21 INFO mapred.JobClient:  map 100% reduce 94%
11/06/10 14:38:23 INFO mapred.LocalJobRunner: reduce>  reduce
11/06/10 14:38:26 INFO mapred.LocalJobRunner: reduce>  reduce
11/06/10 14:38:29 INFO mapred.LocalJobRunner: reduce>  reduce
11/06/10 14:38:32 INFO mapred.LocalJobRunner: reduce>  reduce
11/06/10 14:38:35 INFO mapred.LocalJobRunner: reduce>  reduce
11/06/10 14:38:38 INFO mapred.LocalJobRunner: reduce>  reduce
11/06/10 14:38:41 INFO mapred.LocalJobRunner: reduce>  reduce
11/06/10 14:38:44 INFO mapred.LocalJobRunner: reduce>  reduce
11/06/10 14:38:47 INFO mapred.LocalJobRunner: reduce>  reduce
11/06/10 14:38:50 INFO mapred.LocalJobRunner: reduce>  reduce
11/06/10 14:38:53 INFO mapred.LocalJobRunner: reduce>  reduce
11/06/10 14:38:56 INFO mapred.LocalJobRunner: reduce>  reduce
11/06/10 14:38:57 INFO mapred.JobClient:  map 100% reduce 95%


Here is a copy of top's output while running a reduce:

top - 14:30:53 up 12 days, 33 min,  3 users,  load average: 0.81, 0.38,
0.35
Tasks: 123 total,   1 running, 122 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.1%us,  0.2%sy,  0.0%ni, 74.8%id,  0.0%wa,  0.0%hi,  0.0%si,
0.0%st
Mem:   8003904k total,  5762520k used,  2241384k free,   120180k buffers
Swap:   418808k total,        4k used,   418804k free,  3713236k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

25835 root      20   0 4371m 1.6g  10m S  101 21.3   5:18.69 java

Tank you





Reply via email to