I'm clustering (non-textual) data. Some of the features in my vectors represent
discrete values or "types" such that, for example, one feature may have the 
range
of values 0="red", 1="blue", 2="green", 3="yellow". 

I could also have characterized the same data as 4 features where the value of 
the feature was either 0 or 1, where 1 would imply color blue. 

One distinction between these two approaches is that the first approach creates
dense vectors, whereas the second approach creates sparse vectors.

My question is, from the point of view of accurate clusters, is it better to
characterize the type values one way or the other? A follow up is, for the
recommended approach to characterizing the data in a vector, (if it's possible 
to
generalize) what would be the suggested cluster alg and measurement?

I am new to this, so feel free to be basic in your response!

Reply via email to