Dear All, I am currently trying to write a simple Agglomerative Clustering algorithm which sorts through my MP3 collection and uses associated Last.FM tags to cluster files into 'genres'. Unfortunately, I'm having some trouble with my algorithm and some tracks are ending up in multiple clusters. I have a feeling this is because of (my lack of understanding of) list behaviours within Python. This is all purely a learning exercise, so any input would be greatly appreciated
The actual clustering method can be seen described here. Each Cluster is stored within the 'clusters' array and has two elements: the data containing song metadata such as artist, title and most importantly tags, and a weight: that is, the overall Euclidean weight of the cluster (based on the song data within). In theory, the algorithm runs in a loop until all clusters have been merged into a hierarchy. It takes the weight of each cluster and merges each cluster with their closest counterpart. My code has a lot of comments so it should be fairly easy to understand. Please see below: #-----------------------------Code Snippet -----------------------------------------------# def cluster(self): '''Run the clustering operation on the files ''' #add a cluster for each track to clusters for song in self.music.keys(): self.clusters.append({ 'data' : [song], 'weight' : long(0) }) currentLevel = 0 #current level of hierarchy #while there is more than one cluster in the sorting bank # i.e. we haven't achieved hierachy yet, run the algorithm while( len(self.clusters) > 1): print "Currently at Level %d" % currentLevel print "There are %d clusters at this level" % len(self.clusters) #store the level in the levels array self.levels.append(self.clusters) #empty the clusters list self.clusters = [] #find the weight of each current cluster for c in self.levels[currentLevel]: self.clusters.append({'data' : c['data'], 'weight' : self.__getClusterWeight(c['data'])}) #do the actual clustering tmp = [] for c in self.clusters: closestCluster = None closestDelta = float('inf') for c2 in self.clusters: #skip if this is the same node if(c == c2): continue delta = abs(c2['weight'] - c['weight']) if(delta < closestDelta): closestCluster = c2 closestDelta = delta print "Merging clusters %(c1)d and %(c2)d" % {'c1' : self.clusters.index(c), 'c2' : self.clusters.index(closestCluster)} #now merge the two clusters self.clusters.remove(closestCluster) self.clusters.remove(c) c['data'].extend(closestCluster['data']) tmp.append(c) #increment the level of the hierarchy self.clusters = tmp currentLevel += 1 #--------------------------------End Code Snippet ----------------------------# Thanks, James -- James Ravenscroft http://blog.funkymonkeysoftware.com/ -- http://mail.python.org/mailman/listinfo/python-list