Bikash, Don't use that version. Use a more recent release. We can't help that Cloudera has an old version.
On Tue, Feb 18, 2014 at 1:26 AM, Bikash Gupta <[email protected]>wrote: > Suneel, > > Thanks for the information. > > I am using 0.7 packaged with CDH . > > On Tue, Feb 18, 2014 at 2:14 PM, Suneel Marthi <[email protected]> > wrote: > > > > > > > > > > > > > > On Tuesday, February 18, 2014 3:37 AM, Bikash Gupta < > [email protected]> wrote: > > > > Ted/Peter, > > > > Thanks for the response. > > > > This is exactly what I am trying to achieve. May be I was not able to > > put my questions clearly. > > > > I am clustering on few variables of Customer/User(except their > > customer_id/user_id) and storing customer_id/user_id list in a > > separate place. > > > > Question) What is the approach to identify each member in each cluster > > by its unique id. > > Answer) I have to run a script post-clustering to map > > customer_id/user_id for the clustered output to identify the member > > uniquely. > > > >>> If u r working off of Mahout 0.9 u don't have to do that. The > Clustered output should display the vectors with the vectorid (user_id in > ur case) that belong to a specfic cluster along with the distance of that > vector from the cluster center. > > > > Correct me if I am wrong :) > > > > > > On Tue, Feb 18, 2014 at 10:53 AM, Ted Dunning <[email protected]> > wrote: > >> Bikash, > >> > >> Peter is just right. > >> > >> Yes, you can cluster on these few variables that you have. Probably you > >> should translate location to x,y,z coordinates so that you don't have > >> strange geometry problems, but location, gender and age are quite > >> reasonable characteristics. You will get a fairly weak clustering since > >> these characteristics actually tell very little about people, but it is > a > >> start. > >> > >> You *don't* want to cluster using user ID for exactly the reasons that > >> Peter mentioned. Another way to put it is that the user ID tells you > >> absolutely nothing about the person and thus is not useful for the > >> clustering. > >> > >> You *do* have to retain the assignment of users to cluster and that > >> assignment is usually stored as a list of user ID's for each cluster. > This > >> does not at all imply, however, that the user ID was used to *form* the > >> cluster. > >> > >> > >> > >> > >> On Mon, Feb 17, 2014 at 9:01 PM, Peter Jaumann < > [email protected]>wrote: > >> > >>> Bikash, > >>> As Ted pointed out already...... > >>> You can cluster on all variables except your customer_id. That's your > >>> identifier. > >>> Customers within a cluster are 'similar'; how similar depends on the > >>> fidelity of your clustering. > >>> The clustering algorithm uses (you'll choose) some kind of distance, or > >>> similarity/dissimilarity > >>> measure (which one to use depends on the type of data you have). This > >>> measure will, > >>> eventually, determine how separate/how unique your clusters are. Goal > is to > >>> have your clusters distinct > >>> from each other but have the cluster members, within a cluster, as > similar > >>> as possible. > >>> > >>> In the output, each member in each cluster is uniquely identified by > it's > >>> customer_id, the cluster it belongs to, > >>> and a distance measure that shows (usually) how close, or not, the > >>> customer_id is from its cluster center. > >>> > >>> In terms of what you want to do, my assumption is that you'd like to > find > >>> out a structure, or patterns, > >>> within your customer base, based on a set of variables that you have. > This > >>> is often called a segmentation. > >>> > >>> Hope this helps! What you want to do is a pretty basic and > straight-forward > >>> application of clustering. > >>> Good luck, > >>> -Peter > >>> > >>> > >>> > >>> On Mon, Feb 17, 2014 at 9:48 PM, Bikash Gupta < > [email protected] > >>> >wrote: > >>> > >>> > Basically I am trying to achieve customer segmentation. > >>> > > >>> > Now to measure customer similarity within a cluster I need to > >>> > understand which two customer are similar. > >>> > > >>> > Assumption: To understand these customer uniquely I need to provide > >>> > their CustomerId > >>> > > >>> > Is my assumption correct? If yes then, will customerId affect the > >>> > clustering output > >>> > > >>> > If no then how can I identify customer uniquely > >>> > > >>> > On Tue, Feb 18, 2014 at 2:55 AM, Ted Dunning <[email protected]> > >>> > wrote: > >>> > > That really depends on what you want to do. > >>> > > > >>> > > What is it that you want? > >>> > > > >>> > > > >>> > > On Mon, Feb 17, 2014 at 12:25 PM, Bikash Gupta < > >>> [email protected] > >>> > >wrote: > >>> > > > >>> > >> Ok...so UserId is not a good field for this combination, but if I > want > >>> > >> User Clustering, what should be combination(just for > >>> > >> understanding)..... > >>> > >> > >>> > >> On Tue, Feb 18, 2014 at 1:44 AM, Ted Dunning < > [email protected]> > >>> > >> wrote: > >>> > >> > On Mon, Feb 17, 2014 at 9:00 AM, Bikash Gupta < > >>> > [email protected] > >>> > >> >wrote: > >>> > >> > > >>> > >> >> Let say I am clustering users, I am providing their profile > data to > >>> > >> >> discover similarity between two user. > >>> > >> >> > >>> > >> >> So my input would be [UserId, Location, Age, Gender, Time > Created ] > >>> > >> >> > >>> > >> >> Now if my UserId length is of minimum 10 characters which is > >>> > >> >> comparative very large number than other categorical data. > >>> > >> >> > >>> > >> > > >>> > >> > User id is not a good field for clustering. > >>> > >> > > >>> > >> > Location is fine if you want geo-graphical clsutering. > >>> > >> > > >>> > >> > Location + age + gender is fine for geo-demo-graphical > clustering. > >>> > >> > > >>> > >> > Adding time created might give a tiny bit of insight. > >>> > >> > > >>> > >> > But these fields are not going to lead to great insights. > >>> > >> > >>> > >> > >>> > >> > >>> > >> -- > >>> > >> Thanks & Regards > >>> > >> Bikash Kumar Gupta > > > >>> > >> > >>> > > >>> > > >>> > > >>> > -- > >>> > Thanks & Regards > >>> > Bikash Kumar Gupta > >>> > > >>> > > > > > > > > -- > > Thanks & Regards > > Bikash Kumar Gupta > > > > -- > Thanks & Regards > Bikash Kumar Gupta >
