In this case these were really bad spam so the APOSTROPHE_TOCC is
just riding on the back of other rules, BLs, and high Bayes

What I generally look at is the detailed rule performance in
masscheck. If it primarily hits on spams that score in total 1-3

Why not under 5?

If it's close to 5 and there's a limit that suggests the limit could be increased a bit.

It also needs to take into account the ham hits, which is why having a ham-starved corpus is such a problem.

Are you saying we have a ham-starved corpus?

We have at times in the past. When you're performing analyses like this you need to bear in mind the size of the ham corpus.

