Done in separate email.

On 7/11/12 1:27 PM, Jeff Eastman wrote:
+user@

-------- Original Message --------
Subject:        Re: Cluster Evaluation 0.8 style
Date:   Wed, 11 Jul 2012 16:17:29 -0400
From:   Jeff Eastman <[email protected]>
To:     Pat Ferrel <[email protected]>



It would be more useful for debugging if you could provide the result
clusters and a set of representative points for each. These are more
likely to be tractable in terms of debugging than the entire 8G dataset.


On 7/11/12 3:40 PM, Pat Ferrel wrote:
> As I've said before this issue is still a problem.
>https://issues.apache.org/jira/browse/MAHOUT-1020?focusedCommentId=13409696#comment-13409696 >
> This should be reopened and I sent you a link to get my data (only 8G
> good luck!)
>
> My confusion with the per cluster density measure is because In 0.8 an
> output file is required for clusterdump but the per cluster density
> measure is not written to it. It's in the lNFO output to STDOUT. When
> I run a bunch of these the STDOUT is lost so I'll have to modify my
> scripts or update my KFinder code. I'd vote to include it in the
> output file in the future.
>
> The only problem I've seen with the per cluster Intra-cluster density
> is that I get a lot of pruned clusters sometimes and the Intra-Cluster
> Density is not calculated for them. I think we've discussed this in
> the past.
>
> 12/07/11 12:22:12 INFO evaluation.ClusterEvaluator: Intra-Cluster
> Density[766] = 0.6243875150474454
>
> I really would like to get this stuff working and am willing to
> provide whatever help you need if you are in a position to work on it.
> I have 0.8-SNAPSHOT building but am inexperienced debugging in this
> kind of large data situation but willing to learn. If you'd like me to
> try something out just point me in the right direction.
>
> I'm also happy to test Ted's inter-cluster stuff too.
>
>
> On 7/11/12 11:46 AM, Jeff Eastman wrote:
>> The ClusterEvaluator has methods for both inter-cluster density and
>> intra-cluster density. The former computes the density using the
>> cluster centers, while the latter uses a set of representative points
>> extracted from the clustered points. This reduces the computational
>> overhead of calculating a density from all of the points from each
>> cluster.
>>
>> The unit test uses synthetic data and produces reasonable looking
>> results afaict. Have you had negative experiences with that?
>>
>> On 7/11/12 1:21 PM, Pat Ferrel wrote:
>>> ...
>>>
>>> It was my understanding that the ClusterEvaluator included an
>>> attempt to provide this measure with intra-cluster density per
>>> cluster though it looks like that output has been removed?
>>>
>>
>
>
>
>




Reply via email to