Scott,

thx a bunch for the pointer, very useful.

One thing I would like to clarify tho. I forgot to mention that I ran canopy with T1 == T2 (this was suggested in some post as a method to find in a fast way T2 that gives particular number of canopies. You mention jiras you opened (gonna check them right after) - could it be one of them is for this "special" T1 == T2 case?

br
reinis

On 24.03.2014 15:28, Scott C. Cote wrote:
Reinis,

The documentation has several Jira¹s open - with one with my name on it.

Fortunately, the canopy cluster technology has a good page (as well as
some outdated pages).

Please see this link for your question:

        http://mahout.apache.org/users/clustering/canopy-clustering.html


as I believe that it is well written.

To directly answer your question:

Remember that T1 > T2 and points within T2 are added to the cluster and
removed from the "input set", while points within T1 are added to the
cluster but NOT removed from the ³input set" (and therefore may be added
to another cluster later in the process).

SCott

On 3/24/14, 6:44 AM, "Reinis Vicups" <[email protected]> wrote:

Hi,

apparently I am missunderstanding the way canopy works. I thought that
once datapoint is added to canopy, it is removed from the list of
to-be-clustered points thus one point is assigned to one canopy.

In the example below this is not the case:

:C-28{n=1 c=[70:11.686, 72:7.170, 236:8.182, 396:238.981, 468:40.572,
556:10.985, 889:8.678, 1101:114
:C-29{n=1 c=[70:11.686, 72:7.170, 236:8.182, 396:217.804, 468:33.560,
556:10.985, 889:8.678, 1101:113
:C-30{n=1 c=[70:11.686, 72:7.170, 236:8.182, 396:215.841, 468:37.231,
556:10.985, 889:8.678, 1101:113
:C-31{n=1 c=[70:11.686, 72:7.170, 236:8.182, 396:206.121, 468:32.243,
556:10.985, 889:8.678, 1101:112

So is the correct assumption that only the points within T2 get assigned
to only one canopy or even points within T2 can get assigned to more
than one canopy?

greets
reinis

Reply via email to