I do not see any relationship between the cluster weight vector and the pdf vector. Both are normalized to one. The pdf vector is closer to a uniform distribution than the weight vector from the clustered points file. Both vectors exhibit a maximum for the same cluster. Besides from this, there is no common ground...??
Best regards Sebastian Jeff Eastman <[email protected]> schrieb: >On 3/22/13 10:39 AM, Sebastian Briesemeister wrote: >> Dear all, >> >> I am facing troubles when retrieving the cluster probabilities of >instances: >> >> I am clustering instances using the FuzzyKMeansDriver. >> Afterwards, I am reading instances of WeightedVectorWritable from the >> clusteredPoints file (e.g. part-m-0). >> >> 1.) >> When I am clustering in a sequential manner (no map-reduce), the >> weights of the vectors are reasonable probabilities for the clusters. >> However, when I am running FuzzyKMeansDriver with sequential=false, >the >> weight of each vector equals one for EVERY cluster. So the weights do >> not even sum up to 1. >> >> Am I doing something wrong here? >It sounds like you may have found a bug in the MR version. Those >probabilities should be the same. >> >> >> 2.) >> I tried to circumvent the problem, by using the FuzzyKMeansClusterer: >> After clustering, I retrieved the final clusters (Class Cluster) and >> calculated the distance of every instance to each of the cluster >> centers. Then I calculated the probabilities for each cluster using >the >> computeProbWeight method of FuzzyKMeansClusterer. >> >> Interestingly, these probabilities differ from the probabilities I >get >> from the WeightedVectorWritable instances in the clusteredPoints file >> when clustering with sequential=true. >> >> Why is there a difference between the vector weights and the pdfs?? >The pdf vectors are normalized I believe >> >> Thank you all in advance, >> Sebastian >> >> >> -- Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.
