Re: Can clustering answer these questions

arindam chakraborty Sun, 12 Aug 2012 22:20:22 -0700

Hey Paritosh,

Thanks for the help. That does give a clearer picture


On Mon, Aug 13, 2012 at 7:57 AM, Paritosh Ranjan <[email protected]> wrote:

> I can try to answer few :
>
> 1) I don't know.
>
> 2) Use org.apache.mahout.math.**NamedVector to identify clusters.
>
> 3) Yes, new points can be identified without clustering all over again. See
> org.apache.mahout.clustering.**classify.ClusterClassifier
> org.apache.mahout.clustering.**iterator.ClusterIterator
> org.apache.mahout.clustering.**classify.**ClusterClassificationDriver
>
> 4) I don't think there is any built in implementation for this.
>
> 5) AFAIK, clustering algorithms take sequence files as input, there is no
> support for DB.
>
> 6) Yes, it is possible. Though you will have to write some code. See
> answer to question 3.
>
> 7) No, there is no refresh method sort of thing.
>
> HTH
>
>
> On 12-08-2012 22:58, arindam chakraborty wrote:
>
>> I am considering clustering (Canopy or k-means) to build a recommender but
>> I have following uncertainties. If someone can please clarify them, it
>> will
>> be really helpful.
>>
>> My vector will be points of 8-dimensions. I will expect the clustering
>> phase to group close points in respective clusters. The output is where I
>> am stuck, as to how I can interpret them
>>
>>
>>     1. Since main aim is to recommend similar objects, assumption is that
>>
>>     points in the same cluster will be similar. So Is there a RECOMMENDER
>> based
>>     on the clustering output, or I would have to build that logic manually
>>     2. Since output will have a list of vectors in one cluster (and they
>>
>>     will not be unique) how do I identify them. i.e., which resulting
>> point
>>     means which object, so that I know Object A, B, C are in the same
>> cluster
>>     or not.
>>     3. For a new object P, is there a way to find out its cluster, or I
>> will
>>
>>     have to re-build the clusters all over again
>>     4. In a cluster, say I do identify an object P somehow, how can I
>> figure
>>
>>     out the closest n points to it. Is there any built-in method or I
>> would
>>     have to write my own implementation
>>     5. Can I provide a data source like a DB to the cluster, so that it
>> can
>>
>>     work on the changed rows to fit them in their respective clusters. Or
>> I
>>     would have to rebuild the clusters
>>     6. Can an object O be added to a cluster in real time? Can I find out
>>
>>     its closest points from the cluster in real time. [SIMILAR TO POINT 3
>> & 4 ]
>>     7. Does the cluster need to be rebuilt on every addition to my source
>>
>>     data? Or it can identify the delta, and readjust it. Is there a
>> refresh()
>>     method as there are for Recommenders?
>>
>>
>> If you can answer one or more questions, it would be very useful.
>>
>>
>
>

Re: Can clustering answer these questions

Reply via email to