I didn't notice the --clusters option just reading the patch. If that
puts the clusters into a specific directory then fine. I was suggesting
the default be $output/state rather than currently just writing them all
to $output.
If you want some help I'm available some before next week then more
On Jun 26, 2009, at 3:04 PM, Jeff Eastman wrote:
That looks reasonable, just reading the patch. You might also want
to put the clusters-x files into a state (or clusters) sub-directory
to reduce noise in the output directory and improve consistency with
MS and Dirichlet (which do not thems
That looks reasonable, just reading the patch. You might also want to
put the clusters-x files into a state (or clusters) sub-directory to
reduce noise in the output directory and improve consistency with MS and
Dirichlet (which do not themselves agree on which directory name to use).
Grant I
Check out the patch I just put up on M-138
On Jun 26, 2009, at 12:32 PM, Jeff Eastman wrote:
Grant Ingersoll wrote:
Isn't the KMeansJob pretty much redundant, assuming we add a
parameter to KMeansDriver to take in the number of reduce tasks?
The purpose of the clustering jobs, in general, was
On Jun 26, 2009, at 12:32 PM, Jeff Eastman wrote:
Grant Ingersoll wrote:
Isn't the KMeansJob pretty much redundant, assuming we add a
parameter to KMeansDriver to take in the number of reduce tasks?
The purpose of the clustering jobs, in general, was to simplify
computing the clusters and t
Of course, this should support assigning *any* input to clusters, not just
the original input.
On Fri, Jun 26, 2009 at 9:32 AM, Jeff Eastman wrote:
> 2. Optionally cluster the input data points by assigning them to clusters.
> This would be with probabilities in the case of FuzzyKMeans and Dirich
Grant Ingersoll wrote:
Isn't the KMeansJob pretty much redundant, assuming we add a parameter
to KMeansDriver to take in the number of reduce tasks?
The purpose of the clustering jobs, in general, was to simplify
computing the clusters and then clustering the data. It has been applied
- and cha
On Jun 26, 2009, at 11:32 AM, Grant Ingersoll wrote:
Isn't the KMeansJob pretty much redundant, assuming we add a
parameter to KMeansDriver to take in the number of reduce tasks?
Also, the variable naming in KMeansJob that the number of reduce
tasks (numCentroids) is actually the "k" in k-