Re: Canopies and RowSimilarity

Pat Ferrel Mon, 07 May 2012 07:47:15 -0700

As to my first question, what was your idea for using rowsimilarity toestimate canopy sizes? My corpus size changes often so it would beinteresting to find a way to automatically generate the canopy parameters.


On 5/7/12 5:39 AM, Suneel Marthi wrote:

Uploaded a patch that only deletes the temp output if -ow has been specified.

________________________________
  From: Sebastian Schelter<[email protected]>
To: [email protected]
Sent: Monday, May 7, 2012 8:18 AM
Subject: Re: Canopies and RowSimilarity

The problem with the patch in MAHOUT-834 is that it always cleans the
temp dir, which we don't want to have as standard behavior as Sean put
in the comments. Sometimes other jobs rely on the temp output, so we
should retain it.

We could however include the temp dir cleaning when -ow is provided.

On 07.05.2012 14:02, Suneel Marthi wrote:

1. Please take a look at MAHOUT-834 for the -ow option, there is a patch 
available and is pebnding review..

2. Please take a look at MAHOUT-979 for calculating the number of columns from 
input matrix, I can work on this and upload a patch sometime this week.



________________________________
   From: Sebastian Schelter<[email protected]>
To: [email protected]
Sent: Monday, May 7, 2012 12:51 AM
Subject: Re: Canopies and RowSimilarity

On 06.05.2012 23:08, Pat Ferrel wrote:

BTW Could I vote for a better description of using RowSimilarity?
Shouldn't it have a -ow parameter? It would also be nice if it
calculated the number of columns from the input "matrix". These things
make it hard to automate in scripts.

Could you open a JIRA ticket for that? Sounds like good feature
requests. Would you like to tackle these things yourself?

--sebastian

Re: Canopies and RowSimilarity

Reply via email to