Hello again,
I noticed that in the reduce phase only use one cpu core. This processes
take very long time with 100 % usage but only on one core. Is there a
possibility to parallelise this processes on multiple cores on one local
machine? Could using Hadoop help in some way? I have no experience with
Hadoop at all. :-/
11/06/10 14:38:21 INFO mapred.JobClient: map 100% reduce 94%
11/06/10 14:38:23 INFO mapred.LocalJobRunner: reduce > reduce
11/06/10 14:38:26 INFO mapred.LocalJobRunner: reduce > reduce
11/06/10 14:38:29 INFO mapred.LocalJobRunner: reduce > reduce
11/06/10 14:38:32 INFO mapred.LocalJobRunner: reduce > reduce
11/06/10 14:38:35 INFO mapred.LocalJobRunner: reduce > reduce
11/06/10 14:38:38 INFO mapred.LocalJobRunner: reduce > reduce
11/06/10 14:38:41 INFO mapred.LocalJobRunner: reduce > reduce
11/06/10 14:38:44 INFO mapred.LocalJobRunner: reduce > reduce
11/06/10 14:38:47 INFO mapred.LocalJobRunner: reduce > reduce
11/06/10 14:38:50 INFO mapred.LocalJobRunner: reduce > reduce
11/06/10 14:38:53 INFO mapred.LocalJobRunner: reduce > reduce
11/06/10 14:38:56 INFO mapred.LocalJobRunner: reduce > reduce
11/06/10 14:38:57 INFO mapred.JobClient: map 100% reduce 95%
Here is a copy of top's output while running a reduce:
top - 14:30:53 up 12 days, 33 min, 3 users, load average: 0.81, 0.38, 0.35
Tasks: 123 total, 1 running, 122 sleeping, 0 stopped, 0 zombie
Cpu(s): 25.1%us, 0.2%sy, 0.0%ni, 74.8%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 8003904k total, 5762520k used, 2241384k free, 120180k buffers
Swap: 418808k total, 4k used, 418804k free, 3713236k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25835 root 20 0 4371m 1.6g 10m S 101 21.3 5:18.69 java
Tank you