Can't say just off-hand.

What is the data?

On Mon, Jul 5, 2010 at 8:20 AM, Grant Ingersoll <[email protected]> wrote:

> I'm running ClusterLabels and it seems to be outputting the same values for
> every centroid [1].  When I run the cluster dumper, the top terms are fairly
> different for those same vectors.
>
> Have I hit a vagary of LLR or is this a bug?
>
>
> Thanks,
> Grant
>
>
> [1]
> <snip>
> Top labels for Cluster 129062 containing 22710 vectors
> Term             LLR             In-ClusterDF            Out-ClusterDF
> a               43269.00830466254               0               72060
> his             7185.503760070074               0               17203
> has             7028.243643655442               0               16855
> from            6415.739411605988               0               15488
> year            5930.141497239005               0               14391
> state           5858.43069797568                0               14228
> said            5616.422720833216               0               13676
> it              5545.207108973991               0               13513
> he              5239.340392438695               0               12810
> new             4830.124521905556               0               11862
>
> Top labels for Cluster 129145 containing 11188 vectors
> Term             LLR             In-ClusterDF            Out-ClusterDF
> a               19576.26998734614               0               72060
> his             3352.5135342599824              0               17203
> has             3279.466228939127               0               16855
> from            2994.8128935270943              0               15488
> year            2768.974903047085               0               14391
> state           2735.612128134351               0               14228
> said            2622.997358441353               0               13676
> it              2589.8515553446487              0               13513
> he              2447.4579147226177              0               12810
> new             2256.8640938592143              0               11862
>
> Top labels for Cluster 129201 containing 13040 vectors
> Term             LLR             In-ClusterDF            Out-ClusterDF
> a               23110.173012922285              0               72060
> his             3940.4691014224663              0               17203
> has             3854.554399965331               0               16855
> from            3519.784154796507               0               15488
> year            3254.2127395244315              0               14391
> state           3214.9822960514575              0               14228
> said            3082.565408431459               0               13676
> it              3043.5924300444312              0               13513
> he              2876.171367166564               0               12810
> new             2652.0934832417406              0               11862
>
> Top labels for Cluster 129211 containing 14053 vectors
> Term             LLR             In-ClusterDF            Out-ClusterDF
> a               25083.46391701023               0               72060
> his             4266.378291217145               0               17203
> has             4173.323467798065               0               16855
> from            3810.7467373879626              0               15488
> year            3523.1337431534193              0               14391
> state           3480.648573280778               0               14228
> said            3337.2482196930796              0               13676
> it              3295.0432900944725              0               13513
> he              3113.741967030335               0               12810
> new             2871.0957860480994              0               11862
>
> Top labels for Cluster 129242 containing 12861 vectors
> Term             LLR             In-ClusterDF            Out-ClusterDF
> a               22764.503256496973              0               72060
> his             3883.2002838114277              0               17203
> has             3798.5396822127514              0               16855
> from            3468.6536546614952              0               15488
> year            3206.954131908249               0               14391
> state           3168.2954448102973              0               14228
> said            3037.808057511691               0               13676
> it              2999.402857856825               0               13513
> he              2834.4202939094976              0               12810
> new             2613.604658874683               0               11862
>
> Top labels for Cluster 129245 containing 6443 vectors
> Term             LLR             In-ClusterDF            Out-ClusterDF
> a               10925.268199045677              0               72060
> his             1890.511348863598               0               17203
> has             1849.385320336558               0               16855
> from            1689.0946326381527              0               15488
> year            1561.8904545903206              0               14391
> state           1543.096286157146               0               14228
> said            1479.652662154287               0               13676
> it              1460.9780013803393              0               13513
> he              1380.745082413312               0               12810
> new             1273.3357145632617              0               11862
>
> Top labels for Cluster 129255 containing 11390 vectors
> Term             LLR             In-ClusterDF            Out-ClusterDF
> a               19957.211259535048              0               72060
> his             3416.1555761522613              0               17203
> has             3341.7163103362545              0               16855
> from            3051.6410844950005              0               15488
> year            2821.504116652999               0               14391
> state           2787.5064550531097              0               14228
> said            2672.7490201727487              0               13676
> it              2638.972676954698               0               13513
> he              2493.870809029322               0               12810
> new             2299.653438703157               0               11862
>
> Top labels for Cluster 129265 containing 9461 vectors
> Term             LLR             In-ClusterDF            Out-ClusterDF
> a               16362.85457371641               0               72060
> his             2813.167819214519               0               17203
> has             2751.908798408229               0               16855
> from            2513.176188033074               0               15488
> year            2323.752471229993               0               14391
> state           2295.767774611246               0               14228
> said            2201.3039346230216              0               13676
> it              2173.4997256915085              0               13513
> he              2054.0495802331716              0               12810
> new             1894.1558320098557              0               11862
>
> Top labels for Cluster 129279 containing 14559 vectors
> Term             LLR             In-ClusterDF            Out-ClusterDF
> a               26080.197364640888              0               72060
> his             4430.338072712999               0               17203
> has             4333.689091425855               0               16855
> from            3957.116204748396               0               15488
> year            3658.40981121175                0               14391
> state           3614.286633652635               0               14228
> said            3465.358771919273               0               13676
> it              3421.527382406406               0               13513
> he              3233.2411222746596              0               12810
> new             2981.251407010015               0               11862
>
> Top labels for Cluster 129290 containing 13592 vectors
> Term             LLR             In-ClusterDF            Out-ClusterDF
> a               24181.82589298836               0               72060
> his             4117.6785482652485              0               17203
> has             4027.8821644652635              0               16855
> from            3677.9947950267233              0               15488
> year            3400.440033295192               0               14391
> state           3359.4400672735646              0               14228
> said            3221.0516651300713              0               13676
> it              3180.321518546436               0               13513
> he              3005.353873868007               0               12810
> new             2771.180380204227               0               11862
> </snip>

Reply via email to