What kind of data are you clustering?
Which model distribution are you using?
How many iterations are you running?
How do the cluster n= values change as you increase the number of
iterations?
On 2/7/13 11:35 AM, Aysu Ezen wrote:
Hello,
I am having difficulty with Dirichlet process clustering, I would highly
appreciate any help.
The results of Dirichlet clustering with my data groups all instances in
one single cluster no matter how many iterations I have tried.
The clusterdump output is like:
DC-0 total= 1152000 model= GC:0{*n=1152* c=[0:0.014, 1:0.004, 2:0.001,
3:0.005, 5:0.004
...
DC-1 total= 0 model= GC:1{*n=0* c=[0.085, 0.101, 1.617, -1.592, 0.721,
-1.618, 0.550, 0.302
...
I thought the problem could have been about the way input is read however
when I tried reuters dataset, its output was also similar:
DC-0 total= 320 model= GC:0{*n=32* c=[2.886, 0.210, 0.167, 0.210, 0.664,
0.254, 0.486,
...
DC-1 total= 0 model= GC:1{*n=0* c=[-0.217, -0.522, 1.138, 0.399, -0.314,
1.063, -0.967,
When I use the dictionary for the reuters dataset, it prints reasonable
words for the clusters like:
:DC-0 total
Top Terms:
d => 48.25068240612745
5 => 45.90837124735117
said => 44.70690381526947
topics => 44.07638777047396
22 => 39.78152487426996
companies => 38.85674291104078
date => 38.47198750451207
unknown => 38.33379830792546
reuters => 37.93209125474095
title => 37.45820361748338
:DC-1 total
Top Terms:
foreclosed => 3.973533371410058
18749 => 3.945486656800688
jannock => 3.8038475335990882
48.29 => 3.7140637347393706
asphalt => 3.6475071525946103
fragile => 3.6402008090541895
compiled => 3.584675891358228
642 => 3.5606986939331313
6.73 => 3.5492208849250027
16334 => 3.5394655632624428
Is there anybody who knows about the cause of this problem?
Thanks