@Julien & all Thx for the correction,

@Ken , you know what I just got the book last week, and I'm in the process
of reading it. And whilst I was reading it, I said oops my answer is wrong.

You guys corrected it, fine.

I got to this conclusion because I only ever used  a pseudo/distributed or a
two server cluster and in those cases there is only one reducer.

In the book it is recommended to have less reducer than nodes for
optimisation reasons.

@Marek,
Although I had tried in the past, I never succeeded to get more reducers




2011/6/10 Ken Krugler <[email protected]>

>
> On Jun 10, 2011, at 7:50am, Marek Bachmann wrote:
>
> > Thanks to you all,
> >
> > so to get it on one point: Is it possible to speed up the map / reduce
> task (what ever it exactly does) on a single quad core machine, and if so,
> does anyone know a resource where I can get a little documentation? :-)
>
> Get "Hadoop: The Definitive Guide" by Tom White.
>
> And then set up your machine to run in pseudo-distributed mode.
>
> -- Ken
>
> >
> > Thank you once again.
> >
> > Greetings,
> >
> > Marek
> >
> > On 10.06.2011 16:26, Julien Nioche wrote:
> >> Raymond,
> >>
> >> Hadoop is using a map/reduce algorithm, the reduce phase is that phase
> which
> >>> collects the results from // execution.
> >>> It is inherently not possible to parrallelized that phase.
> >>>
> >>
> >> Sorry to contradict you Raymond but this is incorrect. You can specify
> the
> >> number of reducers to use e.g.
> >>
> >> -D mapred.reduce.tasks=$numTasks
> >>
> >> but obviously this will work only in (pseudo)distributed mode i.e. with
> the
> >> various Hadoop services running indepently of Nutch
> >>
> >>
> >>
> >>
> >>
> >>
> >>>
> >>> -Raymond-
> >>>
> >>> 2011/6/10 Marek Bachmann<[email protected]>
> >>>
> >>>> Hello again,
> >>>>
> >>>> I noticed that in the reduce phase only use one cpu core. This
> processes
> >>>> take very long time with 100 % usage but only on one core. Is there a
> >>>> possibility to parallelise this processes on multiple cores on one
> local
> >>>> machine? Could using Hadoop help in some way? I have no experience
> with
> >>>> Hadoop at all. :-/
> >>>>
> >>>> 11/06/10 14:38:21 INFO mapred.JobClient:  map 100% reduce 94%
> >>>> 11/06/10 14:38:23 INFO mapred.LocalJobRunner: reduce>  reduce
> >>>> 11/06/10 14:38:26 INFO mapred.LocalJobRunner: reduce>  reduce
> >>>> 11/06/10 14:38:29 INFO mapred.LocalJobRunner: reduce>  reduce
> >>>> 11/06/10 14:38:32 INFO mapred.LocalJobRunner: reduce>  reduce
> >>>> 11/06/10 14:38:35 INFO mapred.LocalJobRunner: reduce>  reduce
> >>>> 11/06/10 14:38:38 INFO mapred.LocalJobRunner: reduce>  reduce
> >>>> 11/06/10 14:38:41 INFO mapred.LocalJobRunner: reduce>  reduce
> >>>> 11/06/10 14:38:44 INFO mapred.LocalJobRunner: reduce>  reduce
> >>>> 11/06/10 14:38:47 INFO mapred.LocalJobRunner: reduce>  reduce
> >>>> 11/06/10 14:38:50 INFO mapred.LocalJobRunner: reduce>  reduce
> >>>> 11/06/10 14:38:53 INFO mapred.LocalJobRunner: reduce>  reduce
> >>>> 11/06/10 14:38:56 INFO mapred.LocalJobRunner: reduce>  reduce
> >>>> 11/06/10 14:38:57 INFO mapred.JobClient:  map 100% reduce 95%
> >>>>
> >>>>
> >>>> Here is a copy of top's output while running a reduce:
> >>>>
> >>>> top - 14:30:53 up 12 days, 33 min,  3 users,  load average: 0.81,
> 0.38,
> >>>> 0.35
> >>>> Tasks: 123 total,   1 running, 122 sleeping,   0 stopped,   0 zombie
> >>>> Cpu(s): 25.1%us,  0.2%sy,  0.0%ni, 74.8%id,  0.0%wa,  0.0%hi,  0.0%si,
> >>>> 0.0%st
> >>>> Mem:   8003904k total,  5762520k used,  2241384k free,   120180k
> buffers
> >>>> Swap:   418808k total,        4k used,   418804k free,  3713236k
> cached
> >>>>
> >>>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >>>>
> >>>> 25835 root      20   0 4371m 1.6g  10m S  101 21.3   5:18.69 java
> >>>>
> >>>> Tank you
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> -MilleBii-
> >>>
> >>
> >>
> >>
> >
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> custom data mining solutions
>
>
>
>
>
>
>


-- 
-MilleBii-

Reply via email to