As to my first question, what was your idea for using rowsimilarity to
estimate canopy sizes? My corpus size changes often so it would be
interesting to find a way to automatically generate the canopy parameters.
On 5/7/12 5:39 AM, Suneel Marthi wrote:
Uploaded a patch that only deletes the temp output if -ow has been specified.
________________________________
From: Sebastian Schelter<[email protected]>
To: [email protected]
Sent: Monday, May 7, 2012 8:18 AM
Subject: Re: Canopies and RowSimilarity
The problem with the patch in MAHOUT-834 is that it always cleans the
temp dir, which we don't want to have as standard behavior as Sean put
in the comments. Sometimes other jobs rely on the temp output, so we
should retain it.
We could however include the temp dir cleaning when -ow is provided.
On 07.05.2012 14:02, Suneel Marthi wrote:
1. Please take a look at MAHOUT-834 for the -ow option, there is a patch
available and is pebnding review..
2. Please take a look at MAHOUT-979 for calculating the number of columns from
input matrix, I can work on this and upload a patch sometime this week.
________________________________
From: Sebastian Schelter<[email protected]>
To: [email protected]
Sent: Monday, May 7, 2012 12:51 AM
Subject: Re: Canopies and RowSimilarity
On 06.05.2012 23:08, Pat Ferrel wrote:
BTW Could I vote for a better description of using RowSimilarity?
Shouldn't it have a -ow parameter? It would also be nice if it
calculated the number of columns from the input "matrix". These things
make it hard to automate in scripts.
Could you open a JIRA ticket for that? Sounds like good feature
requests. Would you like to tackle these things yourself?
--sebastian