I've been trying to tune my leave estimation strategy by solving for the exact
values of my commonly considered estimation variables as to minimize the sum of
squared errors between my strategy estimates and all the leaves in Quackle's
"superleaves" file.
So far my variables include only:
-single letter values
-double/triple/quad-letter penalties
-vowel/consonant imbalance penalties
-bonuses for number of tiles in CANISTER
-a couple of letter pair values (QU, YY, IY, FF, ING)
The results aren't acceptable yet. 21,000 of 148,000 five or fewer letter
racks have errors over 5. But I would like to keep the number of terms
managable if possible.
Has anyone else done a similar analysis? Are complete details of Maven's
letter pair values available somewhere?
Thanks,
Eugene d'Eon