Hi,

Most of mahout's algorithm implementations need to run a series of map/reduce jobs to compute their results. By specifying a start and endphase you can make the implementation run only some of these internal jobs. You could e.g. use this to restart a failed execution.

--sebastian


On 20.12.2010 12:41, Fernando Fernández wrote:
But, does this affect the result? What will I get if I launch Rowsimiliarty
(cosine similarity) with --startphase=1 and --endPhase=2? I don't fully
understand what "phases" exactly are in this case.

2010/12/20 Niall Riddell<[email protected]>

Startphase and endphase shouldn't impact overall performance in any way,
however it does mean that you can start at a later stage in a job pipeline.

You can execute specific MR jobs by designating a startphase and endphase.
It goes without saying that the correct inputs must be available to start a
phase correctly.

The first MR job is index 0.  So setting --startPhase 1 will execute the
2nd
job onwards.  Putting in --endPhase 2 would stop after the 3rd job.
On 20 Dec 2010 11:17, "Fernando Fernández"<
[email protected]>  wrote:
Hello everyone,

Can anyone explain what are exactly these two parameters (startphase and
endphase) and how to use them? I'm trying to launch a RowSimilarity job
on
a
50K row matrix (100 columns) with cosine similarity and default
startphase
and endphase parameters and I'm getting a extremely poor performance on a
quite big cluster (After 16 hours, only reached 3% of the proccess) and I
think that this could have something to do with startphase and endphase
parameters. What do you think? How do these paremeters affect the
RowSimilarity job?

Thanks in advance.
Fernando.

Reply via email to