I found an interesting paper that I thought someone here might find helpful.
http://www.milanmirkovic.com/wp-content/uploads/2012/10/pg049_Similarity_Measures_for_Text_Document_Clustering.pdf ABSTRACT: ... A wide variety of distance functions and similarity measures have been used for clustering, such as squared Euclidean distance, cosine similarity, and relative entropy. In this paper, we compare and analyze the effectiveness of these measures in partitional clustering for text document datasets. Our experiments utilize the standard K-means algorithm and we report results on seven text document datasets and five distance/similarity measures that have been most commonly used in text clustering. TL;DR: For text documents, favor Cosine, Jaccard/Tanimoto, or Pearson over Euclidean distance measures.
