Hi, I am able to run the o.a.m.classifier.sgd.TrainNewsGroups example. However I am getting strange results for the top weighted features from the dissector. Here are some snippets from the output.
Training and evaluation: ... 0.00 0.00 0.00 0.00 0.0000000 0.0000000 700 0.000 0.00 none 0.00 0.00 0.00 0.00 0.0000000 0.0000000 800 0.000 0.00 none 0.23 186868.00 52255.00 1265.34 1.3754325e-07 1.0028590e-08 1000 -2.608 34.71 none 0.23 186868.00 52255.00 1265.34 1.3754325e-07 1.0028590e-08 1200 -2.608 34.71 none 0.23 186868.00 52255.00 1265.34 1.3754325e-07 1.0028590e-08 1400 -2.608 34.71 none 0.23 186868.00 52255.00 1265.34 1.3754325e-07 1.0028590e-08 1500 -2.608 34.71 none 1.04 189460.00 55622.00 4651.08 2.5962146e-08 1.0060092e-08 2000 -1.768 56.43 none 1.09 189837.00 60531.00 5314.93 3.1550927e-08 1.0048498e-08 2500 -1.191 71.41 none ... 1.12 189992.00 68364.00 6384.13 2.4446595e-07 1.0034217e-08 6000 -0.880 80.90 none 1.14 189991.00 68775.00 6439.89 3.0565171e-07 1.0033774e-08 7000 -0.849 82.27 none 1.16 189995.00 69360.00 6491.92 3.0565171e-07 1.0000002e-08 8000 -0.860 81.02 none 1.16 189999.00 69919.00 6527.12 3.0566116e-07 1.0000000e-08 10000 -0.851 82.40 none So, I am running over the files in /20news-bydate/20news-bydate-train. I think the above looks reasonable. At least, I like the ~80% accuracy of the classifier. Now, when I look at the top results from the dissector, the features do not make sense (at least compared to similar results given in Listing 15-9 of MIA). In fact, these do not make any sense at all to me. First few results of dissect() body=god -0.1 sci.space 4.0 -0.1394994576714021 5.0 -0.10322063352194852 body=atheists -0.1 comp.windows.x 5.0 -0.07383748917466922 1.0 -0.037205929610919175 body=christian -0.1 talk.politics.mideast 2.0 -0.029106552130967654 4.0 -0.0033808015660384875 body=he 0.1 talk.politics.mideast 18.0 0.07845100216340763 5.0 -0.011218075788326903 body=martin -0.1 talk.politics.mideast 7.0 -0.019407188307985972 10.0 0.00782255718617942 body=say -0.1 comp.sys.ibm.pc.hardware 4.0 -0.0480512351042981 17.0 0.0037854045183534166 body=windows 0.1 comp.windows.x 18.0 -0.06722265016470273 5.0 -0.009627757932247396 body=file -0.1 sci.med 7.0 -0.05790809278204335 5.0 -0.050492324263356765 body=government 0.1 talk.religion.misc 3.0 -0.06076111927305433 2.0 -0.052663471587524276 body=sale -0.1 talk.religion.misc 15.0 -0.03535708180324768 12.0 -0.03532746353789419 body=atheism -0.1 misc.forsale 8.0 -0.05941771751639946 1.0 -0.0500729187538798 body=program -0.1 sci.med 16.0 -0.03820018259936702 7.0 -2.9675316187177843E-4 body=193 0.1 talk.politics.mideast 5.0 0.05061582599095028 17.0 -0.032606809778589076 body=his -0.1 talk.politics.misc 12.0 0.05030942352260737 5.0 -0.04490996261214399 I am not adding any leaks (leakType = 0). Any ideas here? Thanks Chris S.
