Howdy,
I noticed that the sklearn library's QDA doesn't have any regularization. In
my experience, a little convariance regularization can mean the difference
between total failure due to numerical issues and surprisingly
high classification accuracy.
Adding something basic would be as simple as changing (in QDA.py)
U, S, Vt = np.linalg.svd(Xgc, full_matrices=False)
to
U, S, Vt = np.linalg.svd(Xgc, full_matrices=True)
S = (1-reg_param)*S + reg_param
The reg_param parameter would be set to 0 by default.
In addition, I could also add more sophisticated regularization that tries
to set the reg_param automatically using some work my former advisor did in
2007 (here <http://jmlr.org/papers/volume8/srivastava07a/srivastava07a.pdf>)
as follows:
U, S, Vt = np.linalg.svd(Xgc, full_matrices=True)
q = n_features + 3
reg_param = q/(len(Xg) + q)
mult = np.median(S)/q # more robust than np.mean(S)/q
S = (1-reg_param)*S + reg_param*mult
This would occur if the flag "auto_regularization" is set to True (and
supercedes whatever reg_param is set to).
Finally, on line 193 of the qda.py in the predict_proba function we have:
likelihood = np.exp(values - values.min(axis=1)[:, np.newaxis])
Was this actually supposed to be max? Imagine we have 2 classes, and the
log-likelihoods are -600 and -1600. The min is -1600, and subtracting it
yields:
-600 - (-1600) = 1000
-1600 - (-1600) = 0
np.exp(1000), problematically, is inf.
However, using np.max instead of np.min you get:
-600 - (-600) = 0
-1600 - (-600) = -1000
and then np.exp will be 1 and 0 respectively - which is what we expect. In
other words, we are more interested in getting the larger likelihoods
correct, and not ad worried about likelihoods that are much smaller than
the max.
Let me know if these 3 changes/additions sound reasonable. And thanks for
all the wonderful work on sklearn!
Cheers,
sf
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general