Hey Ted, Just FYI,
I changed the Weight subclass of the ModelDissector to sort by true value (rather than absolute value) and reran over the 20 newsgroups data. Here are the results of the dissector function: body=rt 0.042 comp.sys.mac.hardware body=computer 0.039 sci.electronics body=seem 0.035 talk.religion.misc body=mike 0.035 misc.forsale body=windows 0.034 misc.forsale body=just 0.032 sci.crypt body=supports 0.032 talk.politics.mideast body=x 0.032 talk.religion.misc body=do 0.029 rec.motorcycles body=university 0.028 comp.sys.mac.hardware body=slagle 0.028 rec.sport.hockey I prefer the results from MIA :) Anyway, I know you are busy. If there is anything I can do to help, let me know. Still getting familiar with the code, but could help out with some guidance. Thanks a lot, Chris On Dec 17, 2010, at 7:37 PM, Ted Dunning wrote: > Hard to say what changed just off hand. I was tweaking the SGD code pretty > regularly as I learned from the results users were getting. I should look > at the history to review what happened... some changes may not have been > good. > > On Fri, Dec 17, 2010 at 5:28 PM, Chris Schilling > <[email protected]>wrote: > >> Thanks for the answers Ted. Ill take a look inside the dissector. I was >> just wondering because the results are quite a bit different from whats in >> the book - Listing 15.9. Here are those results (where words have weights > >> 1). >> >> body=space 2.1 sci.space >> body=sale 1.9 misc.forsale >> body=car 1.9 rec.autos >> body=windows 1.8 comp.os.ms-windows.misc >> body=mac 1.7 comp.sys.mac.hardware >> body=bike 1.7 rec.motorcycles >> body=apple 1.5 comp.sys.mac.hardware >> body=gun 1.5 talk.politics.guns >> body=baseball 1.5 rec.sport.baseball >> body=graphics 1.5 comp.graphics >> >> >> I guess I mostly want to understand what changed. Again, Ill take a look >> at the dissector, because the results of the training look pretty good. >>
