Reinis, I don’t know - perhaps one of the other denizens of Users has an answer?
SCott On 3/24/14, 10:13 AM, "Reinis Vicups" <[email protected]> wrote: >Scott, > >thx a bunch for the pointer, very useful. > >One thing I would like to clarify tho. I forgot to mention that I ran >canopy with T1 == T2 (this was suggested in some post as a method to >find in a fast way T2 that gives particular number of canopies. You >mention jiras you opened (gonna check them right after) - could it be >one of them is for this "special" T1 == T2 case? > >br >reinis > >On 24.03.2014 15:28, Scott C. Cote wrote: >> Reinis, >> >> The documentation has several Jira¹s open - with one with my name on it. >> >> Fortunately, the canopy cluster technology has a good page (as well as >> some outdated pages). >> >> Please see this link for your question: >> >> http://mahout.apache.org/users/clustering/canopy-clustering.html >> >> >> as I believe that it is well written. >> >> To directly answer your question: >> >> Remember that T1 > T2 and points within T2 are added to the cluster and >> removed from the "input set", while points within T1 are added to the >> cluster but NOT removed from the ³input set" (and therefore may be added >> to another cluster later in the process). >> >> SCott >> >> On 3/24/14, 6:44 AM, "Reinis Vicups" <[email protected]> wrote: >> >>> Hi, >>> >>> apparently I am missunderstanding the way canopy works. I thought that >>> once datapoint is added to canopy, it is removed from the list of >>> to-be-clustered points thus one point is assigned to one canopy. >>> >>> In the example below this is not the case: >>> >>> :C-28{n=1 c=[70:11.686, 72:7.170, 236:8.182, 396:238.981, 468:40.572, >>> 556:10.985, 889:8.678, 1101:114 >>> :C-29{n=1 c=[70:11.686, 72:7.170, 236:8.182, 396:217.804, 468:33.560, >>> 556:10.985, 889:8.678, 1101:113 >>> :C-30{n=1 c=[70:11.686, 72:7.170, 236:8.182, 396:215.841, 468:37.231, >>> 556:10.985, 889:8.678, 1101:113 >>> :C-31{n=1 c=[70:11.686, 72:7.170, 236:8.182, 396:206.121, 468:32.243, >>> 556:10.985, 889:8.678, 1101:112 >>> >>> So is the correct assumption that only the points within T2 get >>>assigned >>> to only one canopy or even points within T2 can get assigned to more >>> than one canopy? >>> >>> greets >>> reinis >
