Re: [Scikit-learn-general] Dimension Requirements on train_test_split and GridSearchCV

David Brough Tue, 06 Jan 2015 10:17:45 -0800

Hey Guys,

While working on a temporary fix for this dimension issue for my package
(PyMKS), I also found that the mse metric from sklearn.metrics has changed
since the summer and also requires the same dimension check. Was this also
fixed with #3987(https://github.com/scikit-learn/scikit-learn/pull/3987)?
Is there a reason why the dimensions should be check for metrics?


Below is the link to the code and some of the of the trace back when trying
to use mse from sklearn.metrics as the 'scoring' parameter in GrindSearchCV.

http://openmaterials.github.io/pymks/rst/cahn_hilliard_Legendre.html#optimizing-the-number-of-local-states

/home/david/anaconda/lib/python2.7/site-packages/sklearn/cross_validation.pyc
in _fit_and_score(estimator, X, y, scorer, train, test, verbose,
parameters, fit_params, return_train_score, return_parameters)   1238
   else:   1239         estimator.fit(X_train, y_train,
**fit_params)-> 1240     test_score = _score(estimator, X_test,
y_test, scorer)   1241     if return_train_score:   1242
train_score = _score(estimator, X_train, y_train, scorer)
/home/david/anaconda/lib/python2.7/site-packages/sklearn/cross_validation.pyc
in _score(estimator, X_test, y_test, scorer)   1294         score =
scorer(estimator, X_test)   1295     else:-> 1296         score =
scorer(estimator, X_test, y_test)   1297     if not isinstance(score,
numbers.Number):   1298         raise ValueError("scoring must return
a number, got %s (%s) instead."
/home/david/anaconda/lib/python2.7/site-packages/sklearn/metrics/scorer.pyc
in __call__(self, estimator, X, y_true)     78         """     79
   y_pred = estimator.predict(X)---> 80         return self._sign *
self._score_func(y_true, y_pred, **self._kwargs)     81      82
<ipython-input-5-e74fc420ba41> in <lambda>(a, b)     19
  'basis': [continuousBasis, legendreBasis]}     20 model =
MKSRegressionModel(continuousBasis)---> 21 scoring =
metrics.make_scorer(lambda a, b: -mse(a, b))     22 fit_params =
{'size': size}     23 gs = GridSearchCV(model, params_to_tune, cv=5,
scoring=scoring, fit_params=fit_params).fit(X_train, y_train)
/home/david/anaconda/lib/python2.7/site-packages/sklearn/metrics/metrics.pyc
in mean_squared_error(y_true, y_pred, sample_weight)   2217    2218
 """-> 2219     y_type, y_true, y_pred = _check_reg_targets(y_true,
y_pred)   2220     return np.average(((y_pred - y_true) **
2).mean(axis=1),   2221                       weights=sample_weight)
/home/david/anaconda/lib/python2.7/site-packages/sklearn/metrics/metrics.pyc
in _check_reg_targets(y_true, y_pred)     63         Estimated target
values.     64     """---> 65     y_true, y_pred =
check_arrays(y_true, y_pred)     66      67     if y_true.ndim == 1:
/home/david/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.pyc
in check_arrays(*arrays, **options)    285             if not allow_nd
and array.ndim >= 3:    286                 raise ValueError("Found
array with dim %d. Expected <= 2" %--> 287
     array.ndim)    288     289         if copy and array is
array_orig:
ValueError: Found array with dim 3. Expected <= 2

Thanks,

David



On Sat, Dec 27, 2014 at 12:58 PM, Andreas Mueller <notificati...@github.com>
wrote:

> Closed #3984 <https://github.com/scikit-learn/scikit-learn/issues/3984>.
>
> —
> Reply to this email directly or view it on GitHub
> <https://github.com/scikit-learn/scikit-learn/issues/3984#event-212211716>
> .
>


On Fri, Dec 19, 2014 at 11:55 AM, Andy <t3k...@gmail.com> wrote:

>  Never mind, I got confused.
> The commit was only for cross_val_score.
> We should have added a "allow_nd=True" to train_test_split.
> I just posted #3986 which will allow this to still run in master, which I
> broke :-/
>
> I'm not sure it is worth doing a bug-fix release for that now. I hope we
> can release soon ;)
>
>
>
> On 12/18/2014 01:11 PM, David Brough wrote:
>
>   Hi,
>
>  I am working on developing a python package that uses machine learning
> speed up the optimization of materials development (pymks.org). This
> package is built on top of sklearn. We have an example in our documentation
> where we have used train_test_split and GridSearchCV to search the
> parameter space (
> http://openmaterials.github.io/pymks/rst/cahn_hilliard_Legendre.html#optimizing-the-number-of-local-states).
>
>
>  This example was working when it was created this summer, but is now
> broken.It seems that the api for these two functions has changed. Why are
> the dimension of the input arrays for both the train_test_split and the
> GridSearchCV checked? It seems like the dimensions of the input arrays are
> irrelevant to those functions.
>
>  Thanks,
>
> David
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, 
> FREEhttp://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>
>
>
> _______________________________________________
> Scikit-learn-general mailing 
> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
>
> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Dimension Requirements on train_test_split and GridSearchCV

Reply via email to