James Ravenscroft wrote: >> > I can't run your code because you didn't make it standalone, > Thanks for the heads up, I've made a simple version of the clusterer > which you can view on pastebin: http://pastebin.com/7HmAkmfj If you have > time to look through my code I would be very grateful! > > > >> > but in your case that is probably not enough. >> > Try something along these lines: >> > >> > # untested >> > while len(self.clusters) > 1: >> > c = self.clusters.pop() >> > # find closest index >> > for i, c2 in enumerate(self.clusters): >> > ... >> > if ...: >> > closest_index = i >> > closest = self.clusters.pop(closest_index) >> > tmp.append(c + closest) >> > if self.clusters: >> > tmp.append(self.clusters[0]) > I had a go at implementing something along the lines above and I'm still > getting very bizarre results. There does appear to be some sort of logic > to it though, if you look at the graph chart, you can see that It seems > to be doing the clustering right and then forgetting to remove the old > groupings providing this "cloning" effect for some cluster groups.
I'm sorry I can't infer the intended algorithm from the code you provide. Perhaps you can give a short description in plain English? However, here's the implementation of the algorithm you mention as described on wikipedia: http://en.wikipedia.org/wiki/Cluster_analysis#Agglomerative_hierarchical_clustering Repeatedly merge the two closest clusters until there's only one left. To keep track of the two merged clusters I've added a "children" key to your node dictionaries. The intermediate states of the tree are also put into the levels variable (I suppose that's the purpose of your levels attribute). The main difference to your code is that (1) list modifications occur only in the outer while loop, so both inner loops can become for-loops again. (2) test all pairs before choosing the pair debug = True def cluster(self): '''Run the clustering operation on the files ''' clusters = [] #add a cluster for each track to clusters for song in self.music: clusters.append({'data' : [song]}) levels = [] while len(clusters) > 1: # calculate weights for c in clusters: c["weight"] = self.__getClusterWeight(c['data']) # find closest pair closestDelta = float('inf') closestPair = None for i, a in enumerate(clusters): for k, b in enumerate(clusters): if i == k: break delta = abs(a['weight'] - b['weight']) if delta < closestDelta: closestPair = i, k closestDelta = delta # merge closest pair hi, lo = closestPair left = clusters[lo] right = clusters.pop(hi) clusters[lo] = {"data": left["data"] + right["data"], "children": [left, right]} # keep track of the intermediate tree states levels.append(list(clusters)) if self.debug: print ("stage %d" % len(levels)).center(40, "-") for c in clusters: print c["data"] # store tree self.clusters = clusters self.levels = levels Note that there's some potential for optimisation. For example, you could move the weight calculation out of the while-loop and just calculate the weight for the newly merged node. Peter -- http://mail.python.org/mailman/listinfo/python-list