>> So it is not a "left child-right Sibling representation", right? > > I am not sure, this is not a term that I am familiar with. Keep in mind > that Ward gives a binary tree, so it would be more a "left child-right > child representation". > > This matrix simple lists the pairs of children for each node, where a node > is denoted as an integer index. It does not include the terminal nodes > (orginal samples) as they have no children.
Thanks Gael for clearing that up. >> Are there any pointers to that specific format or even better does >> anyone have some advice on how to visualize the tree with >> ``scipy.cluster.hierarchy.dendrogram`` or ``graphviz``? > > I couldn't figure out the structure that > scipy.cluster.hierarchy.dendrogram uses. That said, it should be possible > to adapt our representation to something usable be dendrogram, and I'd > love to merge in an example showing how to do this. The scipy dendrogram requires the linkage format as returned by ``scipy.cluster.hierarchy.linkage``: "A 4 by matrix Z is returned. At the -th iteration, clusters with indices Z[i, 0] and Z[i, 1] are combined to form cluster . A cluster with an index less than corresponds to one of the original observations. The distance between clusters Z[i, 0] and Z[i, 1] is given by Z[i, 2]. The fourth value Z[i, 3] represents the number of original observations in the newly formed cluster." I was hoping to use an existing dendrogram function to plot the tree, as they usually offer some other handy features as to truncate the leaves and so on. But as the sklearn ward does not return the distances I don't see how the ``children_`` format can be converted to the (n-1, 4) linkage format. I'll make sure to post a link if I come across a good solution for the binary ward tree. >> As a second, but slightly related question, is it possible to use the >> ward on a n_features x n_features matrix (e.g. an adjacency matrix)? >> It works, but I wasn't sure whether these results can be considered as >> meaningful. > > Ward does not work on adjacency matrices because it is specific to the > euclidean distance. Other hierarchical clustering methods such as > complete linkage would work. Right, of course that makes perfectly sense. Thanks again, Matthias > > HTH, > > Gael > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
