Scott, thx a bunch for the pointer, very useful.
One thing I would like to clarify tho. I forgot to mention that I ran canopy with T1 == T2 (this was suggested in some post as a method to find in a fast way T2 that gives particular number of canopies. You mention jiras you opened (gonna check them right after) - could it be one of them is for this "special" T1 == T2 case?
br reinis On 24.03.2014 15:28, Scott C. Cote wrote:
Reinis, The documentation has several Jira¹s open - with one with my name on it. Fortunately, the canopy cluster technology has a good page (as well as some outdated pages). Please see this link for your question: http://mahout.apache.org/users/clustering/canopy-clustering.html as I believe that it is well written. To directly answer your question: Remember that T1 > T2 and points within T2 are added to the cluster and removed from the "input set", while points within T1 are added to the cluster but NOT removed from the ³input set" (and therefore may be added to another cluster later in the process). SCott On 3/24/14, 6:44 AM, "Reinis Vicups" <[email protected]> wrote:Hi, apparently I am missunderstanding the way canopy works. I thought that once datapoint is added to canopy, it is removed from the list of to-be-clustered points thus one point is assigned to one canopy. In the example below this is not the case: :C-28{n=1 c=[70:11.686, 72:7.170, 236:8.182, 396:238.981, 468:40.572, 556:10.985, 889:8.678, 1101:114 :C-29{n=1 c=[70:11.686, 72:7.170, 236:8.182, 396:217.804, 468:33.560, 556:10.985, 889:8.678, 1101:113 :C-30{n=1 c=[70:11.686, 72:7.170, 236:8.182, 396:215.841, 468:37.231, 556:10.985, 889:8.678, 1101:113 :C-31{n=1 c=[70:11.686, 72:7.170, 236:8.182, 396:206.121, 468:32.243, 556:10.985, 889:8.678, 1101:112 So is the correct assumption that only the points within T2 get assigned to only one canopy or even points within T2 can get assigned to more than one canopy? greets reinis
