Thanks for the answers Ted. Ill take a look inside the dissector. I was just wondering because the results are quite a bit different from whats in the book - Listing 15.9. Here are those results (where words have weights > 1).
body=space 2.1 sci.space body=sale 1.9 misc.forsale body=car 1.9 rec.autos body=windows 1.8 comp.os.ms-windows.misc body=mac 1.7 comp.sys.mac.hardware body=bike 1.7 rec.motorcycles body=apple 1.5 comp.sys.mac.hardware body=gun 1.5 talk.politics.guns body=baseball 1.5 rec.sport.baseball body=graphics 1.5 comp.graphics I guess I mostly want to understand what changed. Again, Ill take a look at the dissector, because the results of the training look pretty good. Good luck, hope things calm down for you. Chris On Dec 17, 2010, at 4:58 PM, Ted Dunning wrote: > Sorry Chris, I am snowed under for the rest of the week. Big sim analysis > at work and finishing revisions on the book at night. > > If you can ping me next week, I should be able to take a look again. My > basic expectation is that you don't have anything going wrong. If you want > to see positive only terms, you might try tweaking the ModelDissector to > sort on value descending rather than absolute value descending. Coefficient > values of 0.1 are about right for a model like this and with 20 newsgroups, > it isn't so surprising to see lots of negative weights. > > On Fri, Dec 17, 2010 at 11:04 AM, Chris Schilling <[email protected]>wrote: > >> Hey Ted, >> >> Any word on this? Is there something I can do to help. I am just not real >> sure on what side of the problem I am on: dissector code or learning >> algorithm. >> >> >> On Dec 16, 2010, at 5:02 PM, Chris Schilling wrote: >> >>> Hey Ted, >>> >>> Okay. I have tested in trunk and 0.4. Pretty similar results. >>> >>> >>> On Dec 16, 2010, at 4:43 PM, Ted Dunning wrote: >>> >>>> I think that the confusion is that many of these have negative weights. >>>> Thus god !=> sci.space, but windows => comp.windows.x. >>>> >>>> Are you running from trunk or 0.4? >>>> >>>> On Thu, Dec 16, 2010 at 4:24 PM, Chris Schilling <[email protected]> >> wrote: >>>> >>>>> First few results of dissect() >>>>> body=god -0.1 sci.space 4.0 -0.1394994576714021 >> 5.0 >>>>> -0.10322063352194852 >>>>> body=atheists -0.1 comp.windows.x 5.0 -0.07383748917466922 >> 1.0 >>>>> -0.037205929610919175 >>>>> body=christian -0.1 talk.politics.mideast 2.0 >>>>> -0.029106552130967654 4.0 -0.0033808015660384875 >>>>> body=he 0.1 talk.politics.mideast 18.0 0.07845100216340763 >> 5.0 >>>>> -0.011218075788326903 >>>>> body=martin -0.1 talk.politics.mideast 7.0 >>>>> -0.019407188307985972 10.0 0.00782255718617942 >>>>> body=say -0.1 comp.sys.ibm.pc.hardware 4.0 >>>>> -0.0480512351042981 17.0 0.0037854045183534166 >>>>> body=windows 0.1 comp.windows.x 18.0 -0.06722265016470273 >> 5.0 >>>>> -0.009627757932247396 >>>>> body=file -0.1 sci.med 7.0 -0.05790809278204335 5.0 >>>>> -0.050492324263356765 >>>>> body=government 0.1 talk.religion.misc 3.0 >>>>> -0.06076111927305433 2.0 -0.052663471587524276 >>>>> body=sale -0.1 talk.religion.misc 15.0 >>>>> -0.03535708180324768 12.0 -0.03532746353789419 >>>>> body=atheism -0.1 misc.forsale 8.0 -0.05941771751639946 >> 1.0 >>>>> -0.0500729187538798 >>>>> body=program -0.1 sci.med 16.0 -0.03820018259936702 7.0 >>>>> -2.9675316187177843E-4 >>>>> body=193 0.1 talk.politics.mideast 5.0 >> 0.05061582599095028 >>>>> 17.0 -0.032606809778589076 >>>>> body=his -0.1 talk.politics.misc 12.0 >> 0.05030942352260737 >>>>> 5.0 -0.04490996261214399 >>>>> >>>>> I am not adding any leaks (leakType = 0). >>>>> >>>>> Any ideas here? >>>>> >>> >> >>
