Whoops, I should have checked that a bit better =)
The regularization needs to be added to "S2" and not "S".
On Sun, Jul 7, 2013 at 4:55 PM, Sergey Feldman <sergeyfeld...@gmail.com>wrote:
> Howdy,
>
> I noticed that the sklearn library's QDA doesn't have any regularization. In
> my experience, a little convariance regularization can mean the difference
> between total failure due to numerical issues and surprisingly
> high classification accuracy.
>
> Adding something basic would be as simple as changing (in QDA.py)
>
> U, S, Vt = np.linalg.svd(Xgc, full_matrices=False)
>
> to
>
> U, S, Vt = np.linalg.svd(Xgc, full_matrices=True)
> S = (1-reg_param)*S + reg_param
>
> The reg_param parameter would be set to 0 by default.
>
>
> In addition, I could also add more sophisticated regularization that tries
> to set the reg_param automatically using some work my former advisor did in
> 2007 (here<http://jmlr.org/papers/volume8/srivastava07a/srivastava07a.pdf>)
> as follows:
>
> U, S, Vt = np.linalg.svd(Xgc, full_matrices=True)
> q = n_features + 3
>
> reg_param = q/(len(Xg) + q)
>
> mult = np.median(S)/q # more robust than np.mean(S)/q
>
> S = (1-reg_param)*S + reg_param*mult
>
>
> This would occur if the flag "auto_regularization" is set to True (and
> supercedes whatever reg_param is set to).
>
>
>
> Finally, on line 193 of the qda.py in the predict_proba function we have:
>
> likelihood = np.exp(values - values.min(axis=1)[:, np.newaxis])
>
> Was this actually supposed to be max? Imagine we have 2 classes, and the
> log-likelihoods are -600 and -1600. The min is -1600, and subtracting it
> yields:
>
> -600 - (-1600) = 1000
> -1600 - (-1600) = 0
>
> np.exp(1000), problematically, is inf.
>
> However, using np.max instead of np.min you get:
>
> -600 - (-600) = 0
> -1600 - (-600) = -1000
>
> and then np.exp will be 1 and 0 respectively - which is what we expect.
> In other words, we are more interested in getting the larger likelihoods
> correct, and not ad worried about likelihoods that are much smaller than
> the max.
>
> Let me know if these 3 changes/additions sound reasonable. And thanks for
> all the wonderful work on sklearn!
>
> Cheers,
> sf
>
>
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general