Chris,
This looks better:
0.58 189545.00 63692.00 3551.55 1.0000002e-08
1.0007058e-08 3000 -1.447 69.79 none
body=gun 1.3 talk.politics.guns 16.0 -0.2414000554631249
17.0 -0.129711508160663
body=windows 1.2 comp.os.ms-windows.misc 3.0 -0.19417418927173208
14.0 -0.16734214222498917
body=sale 1.2 misc.forsale 1.0 -0.141236520301033 4.0
-0.13078372920403203
body=car 1.2 rec.autos 13.0 -0.15182947211484465 10.0
-0.12953193026882154
body=bike 1.1 rec.motorcycles 6.0 -0.1188138958409118 5.0
-0.10294447109840156
body=x 1.1 comp.windows.x 0.0 0.16208435089363293 13.0
-0.1515644646592169
body=israel 1.0 talk.politics.mideast 15.0 -0.14829080862283103
18.0 -0.12856657991660764
body=space 1.0 sci.space 4.0 -0.13411179253228006 15.0
-0.13059284295356896
body=mac 0.9 comp.sys.mac.hardware 11.0 -0.15080978041551327
4.0 -0.08952300323041343
body=apple 0.9 comp.sys.mac.hardware 2.0 -0.08917827214495963
12.0 -0.08678780456560442
body=god 0.9 soc.religion.christian 16.0 -0.31170158967758943
12.0 -0.23551822033335715
There was a bug introduced into the ModelDissector that caused it to show
you the least interesting features rather than the most interesting.
Patch is forthcoming.
On Tue, Dec 28, 2010 at 1:05 PM, Chris Schilling
<[email protected]>wrote:
> Hey Ted,
>
> I went back in time a bit and found a version which returned reasonable
> looking results (at least results which are comparable to those in the
> book). I ran 'svnversion .' and the older (apparently working version)
> returned 1004406M whereas the trunk version I am using is 1050223M. In any
> case, the files in core/o.a.m.classifier.sgd are dated October 4th. It
> looks like between the 4th of Oct and December 7th there was some
> refactoring going on. For instance, the encoders were moved to the vectors
> package (as opposed to the vectorizer.encoders package). I spent a little
> time comparing diffs in the core sgd package but not enough time to discover
> what could be causing this behavior.
>
> I hope this helps.
> Chris
>
>
>
> On Dec 20, 2010, at 4:25 PM, Ted Dunning wrote:
>
> > Yeah... it looks like I really need to jump into this. These results are
> > not right.
> >
> > On Mon, Dec 20, 2010 at 2:11 PM, Chris Schilling <[email protected]>
> wrote:
> >
> >> Hey Ted,
> >>
> >> Just FYI,
> >>
> >> I changed the Weight subclass of the ModelDissector to sort by true
> value
> >> (rather than absolute value) and reran over the 20 newsgroups data.
> Here
> >> are the results of the dissector function:
> >>
> >> body=rt 0.042 comp.sys.mac.hardware
> >> body=computer 0.039 sci.electronics
> >> body=seem 0.035 talk.religion.misc
> >> body=mike 0.035 misc.forsale
> >> body=windows 0.034 misc.forsale
> >> body=just 0.032 sci.crypt
> >> body=supports 0.032 talk.politics.mideast
> >> body=x 0.032 talk.religion.misc
> >> body=do 0.029 rec.motorcycles
> >> body=university 0.028 comp.sys.mac.hardware
> >> body=slagle 0.028 rec.sport.hockey
> >>
> >> I prefer the results from MIA :) Anyway, I know you are busy. If there
> is
> >> anything I can do to help, let me know. Still getting familiar with the
> >> code, but could help out with some guidance.
> >>
> >> Thanks a lot,
> >> Chris
> >>
> >> On Dec 17, 2010, at 7:37 PM, Ted Dunning wrote:
> >>
> >>> Hard to say what changed just off hand. I was tweaking the SGD code
> >> pretty
> >>> regularly as I learned from the results users were getting. I should
> >> look
> >>> at the history to review what happened... some changes may not have
> been
> >>> good.
> >>>
> >>> On Fri, Dec 17, 2010 at 5:28 PM, Chris Schilling
> >>> <[email protected]>wrote:
> >>>
> >>>> Thanks for the answers Ted. Ill take a look inside the dissector. I
> >> was
> >>>> just wondering because the results are quite a bit different from
> whats
> >> in
> >>>> the book - Listing 15.9. Here are those results (where words have
> >> weights >
> >>>> 1).
> >>>>
> >>>> body=space 2.1 sci.space
> >>>> body=sale 1.9 misc.forsale
> >>>> body=car 1.9 rec.autos
> >>>> body=windows 1.8 comp.os.ms-windows.misc
> >>>> body=mac 1.7 comp.sys.mac.hardware
> >>>> body=bike 1.7 rec.motorcycles
> >>>> body=apple 1.5 comp.sys.mac.hardware
> >>>> body=gun 1.5 talk.politics.guns
> >>>> body=baseball 1.5 rec.sport.baseball
> >>>> body=graphics 1.5 comp.graphics
> >>>>
> >>>>
> >>>> I guess I mostly want to understand what changed. Again, Ill take a
> >> look
> >>>> at the dissector, because the results of the training look pretty
> good.
> >>>>
> >>
> >>
>
>