Hi,
Most of mahout's algorithm implementations need to run a series of
map/reduce jobs to compute their results. By specifying a start and
endphase you can make the implementation run only some of these internal
jobs. You could e.g. use this to restart a failed execution.
--sebastian
On 20.12.2010 12:41, Fernando Fernández wrote:
But, does this affect the result? What will I get if I launch Rowsimiliarty
(cosine similarity) with --startphase=1 and --endPhase=2? I don't fully
understand what "phases" exactly are in this case.
2010/12/20 Niall Riddell<[email protected]>
Startphase and endphase shouldn't impact overall performance in any way,
however it does mean that you can start at a later stage in a job pipeline.
You can execute specific MR jobs by designating a startphase and endphase.
It goes without saying that the correct inputs must be available to start a
phase correctly.
The first MR job is index 0. So setting --startPhase 1 will execute the
2nd
job onwards. Putting in --endPhase 2 would stop after the 3rd job.
On 20 Dec 2010 11:17, "Fernando Fernández"<
[email protected]> wrote:
Hello everyone,
Can anyone explain what are exactly these two parameters (startphase and
endphase) and how to use them? I'm trying to launch a RowSimilarity job
on
a
50K row matrix (100 columns) with cosine similarity and default
startphase
and endphase parameters and I'm getting a extremely poor performance on a
quite big cluster (After 16 hours, only reached 3% of the proccess) and I
think that this could have something to do with startphase and endphase
parameters. What do you think? How do these paremeters affect the
RowSimilarity job?
Thanks in advance.
Fernando.