I am not an expert but it does seem like the main novel thing is how (and how many) multi-word tokens are generated. I use have been using multi-word tokens with bogofilter for years and it does help. Of course bogofilter only uses adjacent words -- perhaps OP's way of combining words could yield an increase in accuracy, at the expense of processing time.
The stuff about not-matching rather than matching seems like nonsense. Not to sound mean, but this is not the first time OP has come out with the latest greatest revolution in spam blocking. :) I admire his dedication, in any case!
