On Thu, Mar 22, 2012 at 3:35 AM, James Bergstra
<[email protected]> wrote:

> Also, isn't the feature normalization supposed to be done on a
> fold-by-fold basis? If you're doing that, you have a different kernel
> matrix in every fold anyway.

Indeed, if you want really want to be clean, you would need to do that
but I'm pretty sure the estimates of the mean mu and the variance
sigma with which the normalization is done are about the same whether
you use the union of the train and validation folds or just the train
fold. So yes, not normalizing on a fold-by-fold is kind of cheating
but it makes a huge difference if you reuse the cache. Also, even if
you normalize on a fold-by-fold basis, you can still reuse the cache.
Say I want to find the best parameter combination for "C", "gamma" and
"shrinking", I can carry out the grid search as follows:

for gamma in gamma_values:
  clear cache
  for shrinking in shrinking values:
    for C in C_values:
      fit model (possibly with warm start)

So, the outer loop should always be the one for kernel parameters as
they are the only ones which invalidate the cache.

Mathieu

PS: TF-IDF has the same problem as Normalization: in principle, you
must learn the IDF weights on the training fold only, not on the union
of the training fold and the validation fold.

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to