Excellent hint Chris. This will make it much easier for me to track down. On Tue, Dec 28, 2010 at 1:05 PM, Chris Schilling <[email protected]>wrote:
> Hey Ted, > > I went back in time a bit and found a version which returned reasonable > looking results (at least results which are comparable to those in the > book). I ran 'svnversion .' and the older (apparently working version) > returned 1004406M whereas the trunk version I am using is 1050223M. In any > case, the files in core/o.a.m.classifier.sgd are dated October 4th. It > looks like between the 4th of Oct and December 7th there was some > refactoring going on. For instance, the encoders were moved to the vectors > package (as opposed to the vectorizer.encoders package). I spent a little > time comparing diffs in the core sgd package but not enough time to discover > what could be causing this behavior. > > I hope this helps. > Chris > > > > On Dec 20, 2010, at 4:25 PM, Ted Dunning wrote: > > > Yeah... it looks like I really need to jump into this. These results are > > not right. > > > > On Mon, Dec 20, 2010 at 2:11 PM, Chris Schilling <[email protected]> > wrote: > > > >> Hey Ted, > >> > >> Just FYI, > >> > >> I changed the Weight subclass of the ModelDissector to sort by true > value > >> (rather than absolute value) and reran over the 20 newsgroups data. > Here > >> are the results of the dissector function: > >> > >> body=rt 0.042 comp.sys.mac.hardware > >> body=computer 0.039 sci.electronics > >> body=seem 0.035 talk.religion.misc > >> body=mike 0.035 misc.forsale > >> body=windows 0.034 misc.forsale > >> body=just 0.032 sci.crypt > >> body=supports 0.032 talk.politics.mideast > >> body=x 0.032 talk.religion.misc > >> body=do 0.029 rec.motorcycles > >> body=university 0.028 comp.sys.mac.hardware > >> body=slagle 0.028 rec.sport.hockey > >> > >> I prefer the results from MIA :) Anyway, I know you are busy. If there > is > >> anything I can do to help, let me know. Still getting familiar with the > >> code, but could help out with some guidance. > >> > >> Thanks a lot, > >> Chris > >> > >> On Dec 17, 2010, at 7:37 PM, Ted Dunning wrote: > >> > >>> Hard to say what changed just off hand. I was tweaking the SGD code > >> pretty > >>> regularly as I learned from the results users were getting. I should > >> look > >>> at the history to review what happened... some changes may not have > been > >>> good. > >>> > >>> On Fri, Dec 17, 2010 at 5:28 PM, Chris Schilling > >>> <[email protected]>wrote: > >>> > >>>> Thanks for the answers Ted. Ill take a look inside the dissector. I > >> was > >>>> just wondering because the results are quite a bit different from > whats > >> in > >>>> the book - Listing 15.9. Here are those results (where words have > >> weights > > >>>> 1). > >>>> > >>>> body=space 2.1 sci.space > >>>> body=sale 1.9 misc.forsale > >>>> body=car 1.9 rec.autos > >>>> body=windows 1.8 comp.os.ms-windows.misc > >>>> body=mac 1.7 comp.sys.mac.hardware > >>>> body=bike 1.7 rec.motorcycles > >>>> body=apple 1.5 comp.sys.mac.hardware > >>>> body=gun 1.5 talk.politics.guns > >>>> body=baseball 1.5 rec.sport.baseball > >>>> body=graphics 1.5 comp.graphics > >>>> > >>>> > >>>> I guess I mostly want to understand what changed. Again, Ill take a > >> look > >>>> at the dissector, because the results of the training look pretty > good. > >>>> > >> > >> > >
