Hey Ted,

I went back in time a bit and found a version which returned reasonable looking 
results (at least results which are comparable to those in the book).  I ran 
'svnversion .' and the older (apparently working version) returned 1004406M 
whereas the trunk version I am using is 1050223M.  In any case, the files in 
core/o.a.m.classifier.sgd are dated October 4th.  It looks like between the 4th 
of Oct and December 7th there was some refactoring going on.  For instance, the 
encoders were moved to the vectors package (as opposed to the 
vectorizer.encoders package).  I spent a little time comparing diffs in the 
core sgd package but not enough time to discover what could be causing this 
behavior.  

I hope this helps.  
Chris



On Dec 20, 2010, at 4:25 PM, Ted Dunning wrote:

> Yeah... it looks like I really need to jump into this.  These results are
> not right.
> 
> On Mon, Dec 20, 2010 at 2:11 PM, Chris Schilling <[email protected]> wrote:
> 
>> Hey Ted,
>> 
>> Just FYI,
>> 
>> I changed the Weight subclass of the ModelDissector to sort by true value
>> (rather than absolute value) and reran over the 20 newsgroups data.  Here
>> are the results of the dissector function:
>> 
>> body=rt 0.042   comp.sys.mac.hardware
>> body=computer   0.039   sci.electronics
>> body=seem       0.035   talk.religion.misc
>> body=mike       0.035   misc.forsale
>> body=windows    0.034   misc.forsale
>> body=just       0.032   sci.crypt
>> body=supports   0.032   talk.politics.mideast
>> body=x  0.032   talk.religion.misc
>> body=do 0.029   rec.motorcycles
>> body=university 0.028   comp.sys.mac.hardware
>> body=slagle     0.028   rec.sport.hockey
>> 
>> I prefer the results from MIA :)  Anyway, I know you are busy.  If there is
>> anything I can do to help, let me know.  Still getting familiar with the
>> code, but could help out with some guidance.
>> 
>> Thanks a lot,
>> Chris
>> 
>> On Dec 17, 2010, at 7:37 PM, Ted Dunning wrote:
>> 
>>> Hard to say what changed just off hand.  I was tweaking the SGD code
>> pretty
>>> regularly as I learned from the results users were getting.  I should
>> look
>>> at the history to review what happened... some changes may not have been
>>> good.
>>> 
>>> On Fri, Dec 17, 2010 at 5:28 PM, Chris Schilling
>>> <[email protected]>wrote:
>>> 
>>>> Thanks for the answers Ted.  Ill take a look inside the dissector.  I
>> was
>>>> just wondering because the results are quite a bit different from whats
>> in
>>>> the book - Listing 15.9.  Here are those results (where words have
>> weights >
>>>> 1).
>>>> 
>>>> body=space 2.1 sci.space
>>>> body=sale 1.9 misc.forsale
>>>> body=car 1.9 rec.autos
>>>> body=windows 1.8 comp.os.ms-windows.misc
>>>> body=mac 1.7 comp.sys.mac.hardware
>>>> body=bike 1.7 rec.motorcycles
>>>> body=apple 1.5 comp.sys.mac.hardware
>>>> body=gun 1.5 talk.politics.guns
>>>> body=baseball 1.5 rec.sport.baseball
>>>> body=graphics 1.5 comp.graphics
>>>> 
>>>> 
>>>> I guess I mostly want to understand what changed.  Again, Ill take a
>> look
>>>> at the dissector, because the results of the training look pretty good.
>>>> 
>> 
>> 

Reply via email to