Bikash,
As Ted pointed out already......
You can cluster on all variables except your customer_id. That's your
identifier.
Customers within a cluster are 'similar'; how similar depends on the
fidelity of your clustering.
The clustering algorithm uses (you'll choose) some kind of distance, or
similarity/dissimilarity
measure (which one to use depends on the type of data you have). This
measure will,
eventually, determine how separate/how unique your clusters are. Goal is to
have your clusters distinct
from each other but have the cluster members, within a cluster, as similar
as possible.

In the output, each member in each cluster is uniquely identified by it's
customer_id, the cluster it belongs to,
and a distance measure that shows (usually) how close, or not, the
customer_id is from its cluster center.

In terms of what you want to do, my assumption is that you'd like to find
out a structure, or patterns,
within your customer base, based on a set of variables that you have. This
is often called a segmentation.

Hope this helps! What you want to do is a pretty basic and straight-forward
application of clustering.
Good luck,
-Peter



On Mon, Feb 17, 2014 at 9:48 PM, Bikash Gupta <[email protected]>wrote:

> Basically I am trying to achieve customer segmentation.
>
> Now to measure customer similarity within a cluster I need to
> understand which two customer are similar.
>
> Assumption: To understand these customer uniquely I need to provide
> their CustomerId
>
> Is my assumption correct? If yes then, will customerId affect the
> clustering output
>
> If no then how can I identify customer uniquely
>
> On Tue, Feb 18, 2014 at 2:55 AM, Ted Dunning <[email protected]>
> wrote:
> > That really depends on what you want to do.
> >
> > What is it that you want?
> >
> >
> > On Mon, Feb 17, 2014 at 12:25 PM, Bikash Gupta <[email protected]
> >wrote:
> >
> >> Ok...so UserId is not a good field for this combination, but if I want
> >> User Clustering, what should be combination(just for
> >> understanding).....
> >>
> >> On Tue, Feb 18, 2014 at 1:44 AM, Ted Dunning <[email protected]>
> >> wrote:
> >> > On Mon, Feb 17, 2014 at 9:00 AM, Bikash Gupta <
> [email protected]
> >> >wrote:
> >> >
> >> >> Let say I am clustering users, I am providing their profile data to
> >> >> discover similarity between two user.
> >> >>
> >> >> So my input would be [UserId, Location, Age, Gender, Time Created ]
> >> >>
> >> >> Now if my UserId length is of minimum 10 characters which is
> >> >> comparative very large number than other categorical data.
> >> >>
> >> >
> >> > User id is not a good field for clustering.
> >> >
> >> > Location is fine if you want geo-graphical clsutering.
> >> >
> >> > Location + age + gender is fine for geo-demo-graphical clustering.
> >> >
> >> > Adding time created might give a tiny bit of insight.
> >> >
> >> > But these fields are not going to lead to great insights.
> >>
> >>
> >>
> >> --
> >> Thanks & Regards
> >> Bikash Kumar Gupta
> >>
>
>
>
> --
> Thanks & Regards
> Bikash Kumar Gupta
>

Reply via email to