For questions 1 and 2 you might want to look at https://cwiki.apache.org/MAHOUT/quick-tour-of-text-analysis-using-the-mahout-command-line.html, specifically the rowid and rowsimilarity jobs
On Sun, Aug 12, 2012 at 7:28 PM, arindam chakraborty <[email protected]>wrote: > I am considering clustering (Canopy or k-means) to build a recommender but > I have following uncertainties. If someone can please clarify them, it will > be really helpful. > > My vector will be points of 8-dimensions. I will expect the clustering > phase to group close points in respective clusters. The output is where I > am stuck, as to how I can interpret them > > > 1. Since main aim is to recommend similar objects, assumption is that > points in the same cluster will be similar. So Is there a RECOMMENDER > based > on the clustering output, or I would have to build that logic manually > 2. Since output will have a list of vectors in one cluster (and they > will not be unique) how do I identify them. i.e., which resulting point > means which object, so that I know Object A, B, C are in the same > cluster > or not. > 3. For a new object P, is there a way to find out its cluster, or I will > have to re-build the clusters all over again > 4. In a cluster, say I do identify an object P somehow, how can I figure > out the closest n points to it. Is there any built-in method or I would > have to write my own implementation > 5. Can I provide a data source like a DB to the cluster, so that it can > work on the changed rows to fit them in their respective clusters. Or I > would have to rebuild the clusters > 6. Can an object O be added to a cluster in real time? Can I find out > its closest points from the cluster in real time. [SIMILAR TO POINT 3 & > 4 ] > 7. Does the cluster need to be rebuilt on every addition to my source > data? Or it can identify the delta, and readjust it. Is there a > refresh() > method as there are for Recommenders? > > > If you can answer one or more questions, it would be very useful. >
