Hey Paritosh, Thanks for the help. That does give a clearer picture
On Mon, Aug 13, 2012 at 7:57 AM, Paritosh Ranjan <[email protected]> wrote: > I can try to answer few : > > 1) I don't know. > > 2) Use org.apache.mahout.math.**NamedVector to identify clusters. > > 3) Yes, new points can be identified without clustering all over again. See > org.apache.mahout.clustering.**classify.ClusterClassifier > org.apache.mahout.clustering.**iterator.ClusterIterator > org.apache.mahout.clustering.**classify.**ClusterClassificationDriver > > 4) I don't think there is any built in implementation for this. > > 5) AFAIK, clustering algorithms take sequence files as input, there is no > support for DB. > > 6) Yes, it is possible. Though you will have to write some code. See > answer to question 3. > > 7) No, there is no refresh method sort of thing. > > HTH > > > On 12-08-2012 22:58, arindam chakraborty wrote: > >> I am considering clustering (Canopy or k-means) to build a recommender but >> I have following uncertainties. If someone can please clarify them, it >> will >> be really helpful. >> >> My vector will be points of 8-dimensions. I will expect the clustering >> phase to group close points in respective clusters. The output is where I >> am stuck, as to how I can interpret them >> >> >> 1. Since main aim is to recommend similar objects, assumption is that >> >> points in the same cluster will be similar. So Is there a RECOMMENDER >> based >> on the clustering output, or I would have to build that logic manually >> 2. Since output will have a list of vectors in one cluster (and they >> >> will not be unique) how do I identify them. i.e., which resulting >> point >> means which object, so that I know Object A, B, C are in the same >> cluster >> or not. >> 3. For a new object P, is there a way to find out its cluster, or I >> will >> >> have to re-build the clusters all over again >> 4. In a cluster, say I do identify an object P somehow, how can I >> figure >> >> out the closest n points to it. Is there any built-in method or I >> would >> have to write my own implementation >> 5. Can I provide a data source like a DB to the cluster, so that it >> can >> >> work on the changed rows to fit them in their respective clusters. Or >> I >> would have to rebuild the clusters >> 6. Can an object O be added to a cluster in real time? Can I find out >> >> its closest points from the cluster in real time. [SIMILAR TO POINT 3 >> & 4 ] >> 7. Does the cluster need to be rebuilt on every addition to my source >> >> data? Or it can identify the delta, and readjust it. Is there a >> refresh() >> method as there are for Recommenders? >> >> >> If you can answer one or more questions, it would be very useful. >> >> > >
