Gael,
Ok, thanks for letting me know.
Oliver,
Do those Pipeline objects save the fitted models or do they just save the
steps that were taken? I can't really tell from the documentation.
On Tue, Jul 30, 2013 at 9:50 AM, <
[email protected]> wrote:
> Send Scikit-learn-general mailing list submissions to
> [email protected]
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> or, via email, send a message with subject or body 'help' to
> [email protected]
>
> You can reach the person managing the list at
> [email protected]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Scikit-learn-general digest..."
>
>
> Today's Topics:
>
> 1. Re: Finding the Features a Model Used? (Wifi Gi)
> 2. Re: Finding the Features a Model Used? (Gael Varoquaux)
> 3. Re: Finding the Features a Model Used? (Olivier Grisel)
> 4. Trouble Using Cross Validation (Wifi Gi)
> 5. Re: Trouble Using Cross Validation (Lars Buitinck)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 29 Jul 2013 10:01:06 -0600
> From: Wifi Gi <[email protected]>
> Subject: Re: [Scikit-learn-general] Finding the Features a Model Used?
> To: [email protected]
> Message-ID:
> <
> caccjtvkr46a8gbopjuvmj9btvtz5in584fs_rd9yp+mi_f5...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Olivier,
> I know what you're referring to, but that uses the feature selector, not
> the model (with the model being a classifier like a Decision Tree or
> Gaussian Bayes). These models were saved quite a while ago, so I don't have
> access to whatever feature selector object they used. That was why I
> specifically asked if there was anything like this in the *model*.
>
> On Mon, Jul 29, 2013 at 9:54 AM, <
> [email protected]> wrote:
>
> > Send Scikit-learn-general mailing list submissions to
> > [email protected]
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> > or, via email, send a message with subject or body 'help' to
> > [email protected]
> >
> > You can reach the person managing the list at
> > [email protected]
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Scikit-learn-general digest..."
> >
> >
> > Today's Topics:
> >
> > 1. Re: Feature freeze (Andreas Mueller)
> > 2. Re: Feature freeze (Olivier Grisel)
> > 3. Re: (no subject) (Alexandre ABRAHAM)
> > 4. Re: (no subject) (Gael Varoquaux)
> > 5. Re: Feature freeze (Gael Varoquaux)
> > 6. Re: Gracefully modeling missing values (nan's) in CART
> > (classification or regression trees). (Olivier Grisel)
> > 7. Finding the Features a Model Used? (Wifi Gi)
> > 8. Re: Finding the Features a Model Used? (Olivier Grisel)
> >
> > Message: 7
> > Date: Mon, 29 Jul 2013 09:47:21 -0600
> > From: Wifi Gi <[email protected]>
> > Subject: [Scikit-learn-general] Finding the Features a Model Used?
> > To: [email protected]
> > Message-ID:
> > <
> > caccjtvk8zjaxzbwwauresx1ce76prqm_hgu1-7gucjodw1w...@mail.gmail.com>
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > I've saved a number of models by pickling them to disk, so that I can use
> > them multiple times on different data. Many of these models were built
> > using feature selection, so they don't use all the features of the data
> set
> > (ie. 19 rather than 22 for example). Are the selected features/features
> > used stored anywhere in the model?
> >
> > I can't use the models on the data as I would like because the data has
> > more features, and the number of feature in the input must match the
> number
> > of features in the model. If I can just find the features the model is
> > using, it's trivial to strip them out of the input data, but so far I
> > haven't found anything that gives me this information.
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> >
> > ------------------------------
> >
> > Message: 8
> > Date: Mon, 29 Jul 2013 17:54:02 +0200
> > From: Olivier Grisel <[email protected]>
> > Subject: Re: [Scikit-learn-general] Finding the Features a Model Used?
> > To: scikit-learn-general <[email protected]>
> > Message-ID:
> > <CAFvE7K5ViQ4gyYF=uae5qNYvv9a3q7ASdR3d0LzW357p5nY=
> > [email protected]>
> > Content-Type: text/plain; charset=UTF-8
> >
> > There is a `get_support` method on the feature selector. It returns a
> > boolean mask on the feature space unless you pass `indices=True` to
> > get a fancy indexing mask instead.
> >
> > See
> >
> http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectFpr.html#sklearn.feature_selection.SelectFpr.get_support
> > for instance.
> >
> >
> >
> > ------------------------------
> >
> >
> >
> ------------------------------------------------------------------------------
> > See everything from the browser to the database with AppDynamics
> > Get end-to-end visibility with application monitoring from AppDynamics
> > Isolate bottlenecks and diagnose root cause in seconds.
> > Start your free trial of AppDynamics Pro today!
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> >
> > ------------------------------
> >
> > _______________________________________________
> > Scikit-learn-general mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> >
> > End of Scikit-learn-general Digest, Vol 42, Issue 99
> > ****************************************************
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 2
> Date: Mon, 29 Jul 2013 18:02:38 +0200
> From: Gael Varoquaux <[email protected]>
> Subject: Re: [Scikit-learn-general] Finding the Features a Model Used?
> To: [email protected]
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=us-ascii
>
> On Mon, Jul 29, 2013 at 10:01:06AM -0600, Wifi Gi wrote:
> > I know what you're referring to, but that uses the feature selector, not
> the
> > model (with the model being a classifier like a Decision Tree or Gaussian
> > Bayes). These models were saved quite a while ago, so I don't have
> access to
> > whatever feature selector object they used. That was why I specifically
> asked
> > if there was anything like this in the model.
>
> If you haven't saved the feature selector, the information is lost. Sorry
>
> G
>
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 29 Jul 2013 18:32:07 +0200
> From: Olivier Grisel <[email protected]>
> Subject: Re: [Scikit-learn-general] Finding the Features a Model Used?
> To: scikit-learn-general <[email protected]>
> Message-ID:
> <
> cafve7k7cz_w1utkn7gos6dk-pp-p6qfge5i6ag94i01vffy...@mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> 2013/7/29 Wifi Gi <[email protected]>:
> > Olivier,
> > I know what you're referring to, but that uses the feature selector, not
> the
> > model (with the model being a classifier like a Decision Tree or Gaussian
> > Bayes). These models were saved quite a while ago, so I don't have
> access to
> > whatever feature selector object they used.
>
> The model without the feature selector is useless to make any
> prediction. The two should always be used together, for instance by
> using a Pipeline. You can pickle a complete trained pipeline using
> joblib.
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>
>
>
> ------------------------------
>
> Message: 4
> Date: Tue, 30 Jul 2013 09:35:15 -0600
> From: Wifi Gi <[email protected]>
> Subject: [Scikit-learn-general] Trouble Using Cross Validation
> To: [email protected]
> Message-ID:
> <CACcJtV=
> [email protected]>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I'm having trouble figuring out how to use the cross validation scoring
> function. As far as I can tell, I'm exactly following the tutorial given
> here:
>
> http://scikit-learn.org/0.13/modules/cross_validation.html#computing-cross-validated-metrics
>
> Paraphrase of code:
>
> from sklearn import naive_bayes
> from sklearn import cross_validation
> from sklearn import metrics
>
> alg = naive_bayes.GaussianNB()
> folds = 10
>
> //...data loaded here into x and y
>
> score = cross_validation.cross_val_score(alg, x, y, cv=folds,
> scoring=metrics.accuracy_score)
>
> The error I get is:
> Error:
> Traceback (most recent call last):
> File "pylearn.py", line 431, in <module>
> l = Learn()
> File "pylearn.py", line 173, in __init__
> scores = self.crossVal(x, y, alg, args.cross_val[0])
> File "pylearn.py", line 241, in crossVal
> score = cross_validation.cross_val_score(alg, x, y, cv=folds,
> scoring=metrics.f1_score)
> File
> "/usr/local/lib64/python2.7/site-packages/sklearn/cross_validation.py",
> line 1174, in cross_val_score
> for train, test in cv)
> File
>
> "/usr/local/lib64/python2.7/site-packages/sklearn/externals/joblib/parallel.py",
> line 514, in __call__
> self.dispatch(function, args, kwargs)
> File
>
> "/usr/local/lib64/python2.7/site-packages/sklearn/externals/joblib/parallel.py",
> line 311, in dispatch
> job = ImmediateApply(func, args, kwargs)
> File
>
> "/usr/local/lib64/python2.7/site-packages/sklearn/externals/joblib/parallel.py",
> line 135, in __init__
> self.results = func(*args, **kwargs)
> File
> "/usr/local/lib64/python2.7/site-packages/sklearn/cross_validation.py",
> line 1081, in _cross_val_score
> score = scorer(estimator, X_test, y_test)
> File
> "/usr/local/lib64/python2.7/site-packages/sklearn/metrics/metrics.py", line
> 1212, in f1_score
> pos_label=pos_label, average=average)
> File
> "/usr/local/lib64/python2.7/site-packages/sklearn/metrics/metrics.py", line
> 1359, in fbeta_score
> average=average)
> File
> "/usr/local/lib64/python2.7/site-packages/sklearn/metrics/metrics.py", line
> 1635, in precision_recall_fscore_support
> y_type, y_true, y_pred = _check_clf_targets(y_true, y_pred)
> File
> "/usr/local/lib64/python2.7/site-packages/sklearn/metrics/metrics.py", line
> 162, in _check_clf_targets
> y_true, y_pred = check_arrays(y_true, y_pred, allow_lists=True)
> File
> "/usr/local/lib64/python2.7/site-packages/sklearn/utils/validation.py",
> line 181, in check_arrays
> n_samples = _num_samples(arrays[0])
> File
> "/usr/local/lib64/python2.7/site-packages/sklearn/utils/validation.py",
> line 123, in _num_samples
> raise TypeError("Expected sequence or array-like, got %r" % x)
> TypeError: Expected sequence or array-like, got GaussianNB()
>
> Could anyone tell me what I'm doing wrong? Does cross_val_score only work
> with certain classifiers?
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 5
> Date: Tue, 30 Jul 2013 17:50:05 +0200
> From: Lars Buitinck <[email protected]>
> Subject: Re: [Scikit-learn-general] Trouble Using Cross Validation
> To: [email protected]
> Message-ID:
> <CAKz-xUdR3W+EBdr5OP9Syk2KB=M2yMHXZmsVXO63nBd0y=_
> [email protected]>
> Content-Type: text/plain; charset=UTF-8
>
> 2013/7/30 Wifi Gi <[email protected]>:
> > score = cross_validation.cross_val_score(alg, x, y, cv=folds,
> > scoring=metrics.accuracy_score)
>
> You're mixing scoring (the new API) with score_func (the old API
> described in the 0.13.1 docs). Use `scoring="accuracy"`.
>
> --
> Lars Buitinck
> Scientific programmer, ILPS
> University of Amsterdam
>
>
>
> ------------------------------
>
>
> ------------------------------------------------------------------------------
> Get your SQL database under version control now!
> Version control is standard for application code, but databases havent
> caught up. So what steps can you take to put your SQL databases under
> version control? Why should you start doing it? Read more to find out.
> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
>
> ------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> End of Scikit-learn-general Digest, Vol 42, Issue 100
> *****************************************************
>
------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent
caught up. So what steps can you take to put your SQL databases under
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general