Mahout's examples for clustering involve documents and bags of words.  I want 
to cluster items that include tree-structured and temporal attributes.

The
 tree structured attributes are similar to Java package names (such as 
org.apache.hadoop.hdfs.security.token.delegation). Two packages should 
be considered close together if they share (long) prefixes.

The 
temporal attributes have a periodic structure:    Tuesday, Dec 12 at 
2:13 PM   is close to  Tuesday, Dev 19 at 2:18 PM because they're both 
on Tuesdays, they're both on workdays, and they're both around 3:00 PM.

Is mahout the right tool for clustering such items?

I
 was thinking that I could convert package paths to a sequence of 
symbolic attributes: one for each position.  But that would seem to lose
 information. And I could add derived
 attributes for times:  day of week, time of day, etc.

I can easily define a distance function on items.  Can mahout cluster based 
just on a distance function?

 Thanks, Don

Reply via email to