I found an interesting paper that I thought someone here might find helpful.

http://www.milanmirkovic.com/wp-content/uploads/2012/10/pg049_Similarity_Measures_for_Text_Document_Clustering.pdf

ABSTRACT: ... A wide variety of distance functions and similarity
measures have been used for clustering, such as squared Euclidean
distance, cosine similarity, and relative entropy. In this paper, we
compare and analyze the effectiveness of these measures in partitional
clustering for text document datasets. Our experiments utilize the
standard K-means algorithm and we report results on seven text
document datasets and five distance/similarity measures that have been
most commonly used in text clustering.

TL;DR: For text documents, favor Cosine, Jaccard/Tanimoto, or Pearson
over Euclidean distance measures.

Reply via email to