I think a TCR lambda of 5 is too low for us. This means that we consider 5 FNs to have about the same "cost" as 1 FP, right (reference: http://www.ics.forth.gr/~potamias/mlnia/paper_2.pdf)? I think we have managed okay until now with using such a small value because the score optimizer hasn't really changed in terms of balancing FPs vs. FNs until now.
I think the value should be somewhere between 10 and 500. I'm using 50 for the moment. The balance is all wrong in the perceptron (too many FPs per FN), but I believe I found a reasonably good way to fix it (having the perceptron optimize around a lower threshold than 5.0). Using a lambda of 5.0, I can't really prove it, but when I eyeballed these FP/FN numbers, they seemed much better to me and *are* better with a TCR of 50 (which I think is closer away). Another data point, Craig Hughes used to talk about having a FP-to-FN ratio of 100 as a goal. I think a lambda of 100 is closer to what we want than 5. I realize the Androutsopoulos papers seem to imply a lower number is correct (although I could make a case that they actually don't because foldering is actually worse than sending TMDA-style bounces **once your accuracy reaches the level we're now at**), but I think we need to go with our gut here until someone whips out some real economics research. :-) Daniel -- Daniel Quinlan http://www.pathname.com/~quinlan/
