On Jun 10, 2011, at 7:50am, Marek Bachmann wrote:

> Thanks to you all,
> 
> so to get it on one point: Is it possible to speed up the map / reduce task 
> (what ever it exactly does) on a single quad core machine, and if so, does 
> anyone know a resource where I can get a little documentation? :-)

Get "Hadoop: The Definitive Guide" by Tom White.

And then set up your machine to run in pseudo-distributed mode.

-- Ken

> 
> Thank you once again.
> 
> Greetings,
> 
> Marek
> 
> On 10.06.2011 16:26, Julien Nioche wrote:
>> Raymond,
>> 
>> Hadoop is using a map/reduce algorithm, the reduce phase is that phase which
>>> collects the results from // execution.
>>> It is inherently not possible to parrallelized that phase.
>>> 
>> 
>> Sorry to contradict you Raymond but this is incorrect. You can specify the
>> number of reducers to use e.g.
>> 
>> -D mapred.reduce.tasks=$numTasks
>> 
>> but obviously this will work only in (pseudo)distributed mode i.e. with the
>> various Hadoop services running indepently of Nutch
>> 
>> 
>> 
>> 
>> 
>> 
>>> 
>>> -Raymond-
>>> 
>>> 2011/6/10 Marek Bachmann<[email protected]>
>>> 
>>>> Hello again,
>>>> 
>>>> I noticed that in the reduce phase only use one cpu core. This processes
>>>> take very long time with 100 % usage but only on one core. Is there a
>>>> possibility to parallelise this processes on multiple cores on one local
>>>> machine? Could using Hadoop help in some way? I have no experience with
>>>> Hadoop at all. :-/
>>>> 
>>>> 11/06/10 14:38:21 INFO mapred.JobClient:  map 100% reduce 94%
>>>> 11/06/10 14:38:23 INFO mapred.LocalJobRunner: reduce>  reduce
>>>> 11/06/10 14:38:26 INFO mapred.LocalJobRunner: reduce>  reduce
>>>> 11/06/10 14:38:29 INFO mapred.LocalJobRunner: reduce>  reduce
>>>> 11/06/10 14:38:32 INFO mapred.LocalJobRunner: reduce>  reduce
>>>> 11/06/10 14:38:35 INFO mapred.LocalJobRunner: reduce>  reduce
>>>> 11/06/10 14:38:38 INFO mapred.LocalJobRunner: reduce>  reduce
>>>> 11/06/10 14:38:41 INFO mapred.LocalJobRunner: reduce>  reduce
>>>> 11/06/10 14:38:44 INFO mapred.LocalJobRunner: reduce>  reduce
>>>> 11/06/10 14:38:47 INFO mapred.LocalJobRunner: reduce>  reduce
>>>> 11/06/10 14:38:50 INFO mapred.LocalJobRunner: reduce>  reduce
>>>> 11/06/10 14:38:53 INFO mapred.LocalJobRunner: reduce>  reduce
>>>> 11/06/10 14:38:56 INFO mapred.LocalJobRunner: reduce>  reduce
>>>> 11/06/10 14:38:57 INFO mapred.JobClient:  map 100% reduce 95%
>>>> 
>>>> 
>>>> Here is a copy of top's output while running a reduce:
>>>> 
>>>> top - 14:30:53 up 12 days, 33 min,  3 users,  load average: 0.81, 0.38,
>>>> 0.35
>>>> Tasks: 123 total,   1 running, 122 sleeping,   0 stopped,   0 zombie
>>>> Cpu(s): 25.1%us,  0.2%sy,  0.0%ni, 74.8%id,  0.0%wa,  0.0%hi,  0.0%si,
>>>> 0.0%st
>>>> Mem:   8003904k total,  5762520k used,  2241384k free,   120180k buffers
>>>> Swap:   418808k total,        4k used,   418804k free,  3713236k cached
>>>> 
>>>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>> 
>>>> 25835 root      20   0 4371m 1.6g  10m S  101 21.3   5:18.69 java
>>>> 
>>>> Tank you
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> -MilleBii-
>>> 
>> 
>> 
>> 
> 

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom data mining solutions






Reply via email to