Re: RowSimilarity startphase and endphase parameters

Sebastian Schelter Mon, 20 Dec 2010 03:46:43 -0800

Hi,

Most of mahout's algorithm implementations need to run a series ofmap/reduce jobs to compute their results. By specifying a start andendphase you can make the implementation run only some of these internaljobs. You could e.g. use this to restart a failed execution.


--sebastian


On 20.12.2010 12:41, Fernando Fernández wrote:

But, does this affect the result? What will I get if I launch Rowsimiliarty
(cosine similarity) with --startphase=1 and --endPhase=2? I don't fully
understand what "phases" exactly are in this case.

2010/12/20 Niall Riddell<[email protected]>

Startphase and endphase shouldn't impact overall performance in any way,
however it does mean that you can start at a later stage in a job pipeline.

You can execute specific MR jobs by designating a startphase and endphase.
It goes without saying that the correct inputs must be available to start a
phase correctly.

The first MR job is index 0.  So setting --startPhase 1 will execute the
2nd
job onwards.  Putting in --endPhase 2 would stop after the 3rd job.
On 20 Dec 2010 11:17, "Fernando Fernández"<
[email protected]>  wrote:

Hello everyone,

Can anyone explain what are exactly these two parameters (startphase and
endphase) and how to use them? I'm trying to launch a RowSimilarity job

on
a

50K row matrix (100 columns) with cosine similarity and default

startphase

and endphase parameters and I'm getting a extremely poor performance on a
quite big cluster (After 16 hours, only reached 3% of the proccess) and I
think that this could have something to do with startphase and endphase
parameters. What do you think? How do these paremeters affect the
RowSimilarity job?

Thanks in advance.
Fernando.

Re: RowSimilarity startphase and endphase parameters

Reply via email to