I am considering clustering (Canopy or k-means) to build a recommender but I have following uncertainties. If someone can please clarify them, it will be really helpful.
My vector will be points of 8-dimensions. I will expect the clustering phase to group close points in respective clusters. The output is where I am stuck, as to how I can interpret them 1. Since main aim is to recommend similar objects, assumption is that points in the same cluster will be similar. So Is there a RECOMMENDER based on the clustering output, or I would have to build that logic manually 2. Since output will have a list of vectors in one cluster (and they will not be unique) how do I identify them. i.e., which resulting point means which object, so that I know Object A, B, C are in the same cluster or not. 3. For a new object P, is there a way to find out its cluster, or I will have to re-build the clusters all over again 4. In a cluster, say I do identify an object P somehow, how can I figure out the closest n points to it. Is there any built-in method or I would have to write my own implementation 5. Can I provide a data source like a DB to the cluster, so that it can work on the changed rows to fit them in their respective clusters. Or I would have to rebuild the clusters 6. Can an object O be added to a cluster in real time? Can I find out its closest points from the cluster in real time. [SIMILAR TO POINT 3 & 4 ] 7. Does the cluster need to be rebuilt on every addition to my source data? Or it can identify the delta, and readjust it. Is there a refresh() method as there are for Recommenders? If you can answer one or more questions, it would be very useful.
