Hi,

I am able to run the o.a.m.classifier.sgd.TrainNewsGroups example.  However I 
am getting strange results for the top weighted features from the dissector.  
Here are some snippets from the output.

Training and evaluation:
...
0.00    0.00    0.00    0.00    0.0000000       0.0000000       700     0.000   
0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       800     0.000   
0.00    none
0.23    186868.00       52255.00        1265.34 1.3754325e-07   1.0028590e-08   
1000    -2.608  34.71   none
0.23    186868.00       52255.00        1265.34 1.3754325e-07   1.0028590e-08   
1200    -2.608  34.71   none
0.23    186868.00       52255.00        1265.34 1.3754325e-07   1.0028590e-08   
1400    -2.608  34.71   none
0.23    186868.00       52255.00        1265.34 1.3754325e-07   1.0028590e-08   
1500    -2.608  34.71   none
1.04    189460.00       55622.00        4651.08 2.5962146e-08   1.0060092e-08   
2000    -1.768  56.43   none
1.09    189837.00       60531.00        5314.93 3.1550927e-08   1.0048498e-08   
2500    -1.191  71.41   none
...
1.12    189992.00       68364.00        6384.13 2.4446595e-07   1.0034217e-08   
6000    -0.880  80.90   none
1.14    189991.00       68775.00        6439.89 3.0565171e-07   1.0033774e-08   
7000    -0.849  82.27   none
1.16    189995.00       69360.00        6491.92 3.0565171e-07   1.0000002e-08   
8000    -0.860  81.02   none
1.16    189999.00       69919.00        6527.12 3.0566116e-07   1.0000000e-08   
10000   -0.851  82.40   none

So, I am running over the files in /20news-bydate/20news-bydate-train.  I think 
the above looks reasonable.  At least, I like the ~80% accuracy of the 
classifier.  Now, when I look at the top results from the dissector, the 
features do not make sense (at least compared to similar results given in 
Listing 15-9 of MIA).  In fact, these do not make any sense at all to me.  

First few results of dissect()
body=god        -0.1    sci.space       4.0     -0.1394994576714021     5.0     
-0.10322063352194852
body=atheists   -0.1    comp.windows.x  5.0     -0.07383748917466922    1.0     
-0.037205929610919175
body=christian  -0.1    talk.politics.mideast   2.0     -0.029106552130967654   
4.0     -0.0033808015660384875
body=he 0.1     talk.politics.mideast   18.0    0.07845100216340763     5.0     
-0.011218075788326903
body=martin     -0.1    talk.politics.mideast   7.0     -0.019407188307985972   
10.0    0.00782255718617942
body=say        -0.1    comp.sys.ibm.pc.hardware        4.0     
-0.0480512351042981     17.0    0.0037854045183534166
body=windows    0.1     comp.windows.x  18.0    -0.06722265016470273    5.0     
-0.009627757932247396
body=file       -0.1    sci.med 7.0     -0.05790809278204335    5.0     
-0.050492324263356765
body=government 0.1     talk.religion.misc      3.0     -0.06076111927305433    
2.0     -0.052663471587524276
body=sale       -0.1    talk.religion.misc      15.0    -0.03535708180324768    
12.0    -0.03532746353789419
body=atheism    -0.1    misc.forsale    8.0     -0.05941771751639946    1.0     
-0.0500729187538798
body=program    -0.1    sci.med 16.0    -0.03820018259936702    7.0     
-2.9675316187177843E-4
body=193        0.1     talk.politics.mideast   5.0     0.05061582599095028     
17.0    -0.032606809778589076
body=his        -0.1    talk.politics.misc      12.0    0.05030942352260737     
5.0     -0.04490996261214399

I am not adding any leaks (leakType = 0).  

Any ideas here?  
Thanks
Chris S.

Reply via email to