On Fri, Dec 28, 2012 at 12:35 AM, Dan Filimon
<[email protected]>wrote:

> I have a couple of questions:
> - how did you pick 1000 as the dimension of the vectors?
>

Out of nowhere.  Partly motivated by a desire to be able to pull the data
into R.


>  - what is spoking behavior? is it that there seem to be some lines
> going through the origin that points tend to be on?
>

Yes.


> - when you say you built a multinomial model, how did you see strong
> signals? I'm not sure how you used it actually. :)
>

> library(nnet)
> m=multinom(group ~ x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12+x13+x14+x15, x0)
> m
Call:
multinom(formula = group ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 +
    x8 + x9 + x10 + x11 + x12 + x13 + x14 + x15, data = x0)

Coefficients:
                         (Intercept)          x1         x2         x3
comp.graphics             0.44074296  1.01907543 -1.8568888 -9.1031682
comp.os.ms-windows.misc   0.45704221 -6.54596820 -0.5043770 -1.1127980
comp.sys.ibm.pc.hardware  0.29866502 -5.55952654 -0.8972155 -9.0253191
comp.sys.mac.hardware     0.45269879 -3.32835314 -5.5403978 -5.1908325
comp.windows.x            0.63847150 -4.40205097 -2.4856457 -5.2730532
misc.forsale              0.59012615  4.19061796 -3.8970935 -5.1713703
rec.autos                 0.34494436 -2.60865621 -0.6769143 -3.8443432
rec.motorcycles           0.55637401 -1.86907459 -2.6722765  3.8598477
rec.sport.baseball        0.64776087  1.85164391 -5.5293810 -0.6844863
rec.sport.hockey          0.26954741 -0.06203175 -0.3972151  8.4481839
sci.crypt                 0.07929986  3.60519397  2.1716362 -5.4979043
sci.electronics           0.31714050 -1.33324287  0.7805920 -4.8241339
sci.med                   0.29779539  4.31605496 -0.8727921  2.9589061
sci.space                 0.20997964  2.39444885  1.9581433 -1.1467414
soc.religion.christian   -0.02225500  1.11565767  3.6428443  0.7498387
talk.politics.guns        0.11895828  5.39121244 -0.1921129 -0.9881931
talk.politics.mideast     0.03808929 -0.84748051  2.0709237  0.6018028
talk.politics.misc       -0.21078137  1.37337160  4.5629442  0.6928847
talk.religion.misc       -0.36185050  0.03655827  0.5980048  0.5054213
                                x4         x5          x6         x7
  x8
comp.graphics             9.033334  7.0192748  0.16837265  0.2187211
-3.6491393
comp.os.ms-windows.misc   7.105049  0.2416844  6.47463665  2.6865188
 0.9813021
comp.sys.ibm.pc.hardware 11.208601  5.1641788  6.89008112  6.8821034
-1.1563185
comp.sys.mac.hardware    10.960475 -1.9092041  2.40371401 -1.0299831
-0.3979949
comp.windows.x            7.831094  0.1186285  0.23514096 -1.6337241
-3.9838201
misc.forsale              5.769901  6.7159812 -1.93968915  2.6798810
 1.1244947
rec.autos                 3.865056  0.1188188  6.30501368  1.1181748
-4.0945192
rec.motorcycles           4.370033  1.3509225 -2.51137387  1.7889965
-0.2231039
rec.sport.baseball        4.905747  1.7073982  6.01976440 -2.0471183
-4.9937091
rec.sport.hockey          4.575955 -0.0869318 -1.85252683  2.7528117
-3.3982038
sci.crypt                 8.485397  3.4533590  2.37158269 -0.7041111
-6.4355892
sci.electronics           8.807797 -1.1365877  1.01298828  2.3025815
-3.7304551
sci.med                   4.251672  1.6995090  3.30625008 -0.8523060
-2.3018664
sci.space                 9.040464  5.9714280  2.79637661  0.8799763
-5.8195717
soc.religion.christian    6.907276  0.4557410 -0.98188302 -2.1058969
-4.5960733
talk.politics.guns        7.149903 -2.9766376  1.87073733  4.8228957
-3.6893940
talk.politics.mideast     6.050361  1.9889525 -3.17168544 -2.3966889
 8.1085935
talk.politics.misc        8.555592  1.4756449 -0.47450379  3.8521708
-3.4629278
talk.religion.misc        9.774690  4.1712967  0.03906819  1.2280719
-6.7910743
                                  x9         x10        x11         x12
comp.graphics            -0.39312040  -3.2572069 -7.4439559  -5.5429428
comp.os.ms-windows.misc   4.09046505  -6.4363098 -6.2324448  -6.1287834
comp.sys.ibm.pc.hardware  0.02015251  -1.9149433 -3.6751577  -5.2145510
comp.sys.mac.hardware     3.81277265  -9.4835151 -5.6315064  -2.3164148
comp.windows.x            3.33875226 -12.5747792 -9.4202176  -3.1903203
misc.forsale              0.08603018  -6.1517657 -5.6218740  -2.2750419
rec.autos                 0.88637347  -3.8640771 -3.7595583  -2.1053560
rec.motorcycles          -2.31446575 -11.3232215 -4.9337654  -3.2533249
rec.sport.baseball        0.47952604  -4.4506352 -3.7291590 -13.3192078
rec.sport.hockey          1.56687927  -3.4629955  0.8472809  -5.4775734
sci.crypt                 2.56905183  -6.1068418  0.8769075  -1.4013213
sci.electronics          -2.93202536   1.9462835 -4.2089267  -2.1560806
sci.med                   3.72384120  -5.6507869 -6.4614948  -2.5490348
sci.space                -2.49644737  -4.9534675 -5.6519521  -2.6338873
soc.religion.christian    2.67908535   1.1556453  6.6814628  -3.3196256
talk.politics.guns       -1.10974910  -5.4778037 -2.9293610  -1.1856183
talk.politics.mideast     5.72453312  -2.6987484  0.3532723  -4.3997810
talk.politics.misc        1.18303190  -5.3802093 -3.4711458   0.4928157
talk.religion.misc        0.45942122  -0.1637296  3.4285962  -6.9764489
                                x13         x14         x15
comp.graphics             -3.287707  0.35639470 -6.91246416
comp.os.ms-windows.misc   -5.875776 -3.33726744 -6.89408072
comp.sys.ibm.pc.hardware -11.347861 -2.39922482 -1.47182928
comp.sys.mac.hardware     -3.979639 -0.05450003 -0.06150157
comp.windows.x            -1.753831 -2.01337073 -3.80837072
misc.forsale             -19.850634  0.98566399 -7.73691044
rec.autos                 -5.288611  3.52664949  1.49950946
rec.motorcycles           -4.760638  2.91862795 -3.38732795
rec.sport.baseball       -13.446201 -6.45080449 -1.48063043
rec.sport.hockey          -9.746940 -1.58992997  3.76260962
sci.crypt                  1.106581  0.31910767  0.27873970
sci.electronics           -7.679123  1.32204155  0.72514034
sci.med                   -2.625328 -3.55163264 -0.44419996
sci.space                 -7.155133  1.61298084  5.15959340
soc.religion.christian    -3.812396  1.15236988  1.45284026
talk.politics.guns        -6.649506 -2.65037735  7.37585392
talk.politics.mideast     -2.596350 -2.90530800  2.50239590
talk.politics.misc        -1.123491  1.02627441  4.08961917
talk.religion.misc        -3.643682 -0.33512039  1.78958648

Residual Deviance: 66003.82
AIC: 66611.82


Also, I did some experiments with clusters constructed based on each
newsgroup:

> m=aggregate(x0[,3:1002], by=list(group=x0$group), FUN=mean)
> plot(apply(m[,2:1001], MARGIN=1, FUN=function(v)
{sqrt(sum((v-x0[9000,3:1002])^2))}))
> (x0$group[9000])
[1] soc.religion.christian
20 Levels: alt.atheism comp.graphics ... talk.religion.misc
> as.numeric(x0$group[9000])
[1] 16


Note that element 16 is the lowest.  Note that the second lowest elements
are related groups.

Reply via email to