> On 11 Feb 2015, at 16:31, Andy <t3k...@gmail.com> wrote:
> 
> 
> On 02/11/2015 04:22 PM, Timothy Vivian-Griffiths wrote:
>> Hi Gilles,
>> 
>> Thank you so much for clearing this up for me. So, am I right in thinking 
>> that the feature selection is carried for every CV-fold, and then once the 
>> best parameters have been found, the pipeline is then run on the whole 
>> training set in order to get the .best_estimator_?
> Yes.

Well, only if you have `refit=True` in your search. Otherwise you have to take 
care of it yourself.

I’m jumping in with a related question.  I have a small, noisy dataset where 
I’m really mostly interested in interpreting the features, but I also want to 
justify it with a well-performing classifier. Feature selection works well, but 
I end up with a small number of very sparse features. I’m thinking of 
(manually) taking the union of all features selected in each fold for 
interpretation purposes. I think it wouldn’t be unreasonable, am I missing 
something?

Cheers,
Vlad
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to