Bikash, Peter is just right.
Yes, you can cluster on these few variables that you have. Probably you should translate location to x,y,z coordinates so that you don't have strange geometry problems, but location, gender and age are quite reasonable characteristics. You will get a fairly weak clustering since these characteristics actually tell very little about people, but it is a start. You *don't* want to cluster using user ID for exactly the reasons that Peter mentioned. Another way to put it is that the user ID tells you absolutely nothing about the person and thus is not useful for the clustering. You *do* have to retain the assignment of users to cluster and that assignment is usually stored as a list of user ID's for each cluster. This does not at all imply, however, that the user ID was used to *form* the cluster. On Mon, Feb 17, 2014 at 9:01 PM, Peter Jaumann <[email protected]>wrote: > Bikash, > As Ted pointed out already...... > You can cluster on all variables except your customer_id. That's your > identifier. > Customers within a cluster are 'similar'; how similar depends on the > fidelity of your clustering. > The clustering algorithm uses (you'll choose) some kind of distance, or > similarity/dissimilarity > measure (which one to use depends on the type of data you have). This > measure will, > eventually, determine how separate/how unique your clusters are. Goal is to > have your clusters distinct > from each other but have the cluster members, within a cluster, as similar > as possible. > > In the output, each member in each cluster is uniquely identified by it's > customer_id, the cluster it belongs to, > and a distance measure that shows (usually) how close, or not, the > customer_id is from its cluster center. > > In terms of what you want to do, my assumption is that you'd like to find > out a structure, or patterns, > within your customer base, based on a set of variables that you have. This > is often called a segmentation. > > Hope this helps! What you want to do is a pretty basic and straight-forward > application of clustering. > Good luck, > -Peter > > > > On Mon, Feb 17, 2014 at 9:48 PM, Bikash Gupta <[email protected] > >wrote: > > > Basically I am trying to achieve customer segmentation. > > > > Now to measure customer similarity within a cluster I need to > > understand which two customer are similar. > > > > Assumption: To understand these customer uniquely I need to provide > > their CustomerId > > > > Is my assumption correct? If yes then, will customerId affect the > > clustering output > > > > If no then how can I identify customer uniquely > > > > On Tue, Feb 18, 2014 at 2:55 AM, Ted Dunning <[email protected]> > > wrote: > > > That really depends on what you want to do. > > > > > > What is it that you want? > > > > > > > > > On Mon, Feb 17, 2014 at 12:25 PM, Bikash Gupta < > [email protected] > > >wrote: > > > > > >> Ok...so UserId is not a good field for this combination, but if I want > > >> User Clustering, what should be combination(just for > > >> understanding)..... > > >> > > >> On Tue, Feb 18, 2014 at 1:44 AM, Ted Dunning <[email protected]> > > >> wrote: > > >> > On Mon, Feb 17, 2014 at 9:00 AM, Bikash Gupta < > > [email protected] > > >> >wrote: > > >> > > > >> >> Let say I am clustering users, I am providing their profile data to > > >> >> discover similarity between two user. > > >> >> > > >> >> So my input would be [UserId, Location, Age, Gender, Time Created ] > > >> >> > > >> >> Now if my UserId length is of minimum 10 characters which is > > >> >> comparative very large number than other categorical data. > > >> >> > > >> > > > >> > User id is not a good field for clustering. > > >> > > > >> > Location is fine if you want geo-graphical clsutering. > > >> > > > >> > Location + age + gender is fine for geo-demo-graphical clustering. > > >> > > > >> > Adding time created might give a tiny bit of insight. > > >> > > > >> > But these fields are not going to lead to great insights. > > >> > > >> > > >> > > >> -- > > >> Thanks & Regards > > >> Bikash Kumar Gupta > > >> > > > > > > > > -- > > Thanks & Regards > > Bikash Kumar Gupta > > >
