Hi all,

I would like to add a "combiner" class which would work with pipeline to allow 
users to augment the output of scikit's text feature extraction process (or 
other feature extraction processes). For example, after apply CountVectorizer, 
it is sometime desirable to augment the resulting dataset with additional 
features. Unless I am missing something, this is not easily done if the count 
vectorization is being used in a pipeline, especially if CountVectorizer 
parameters such as min_df are being optimized along with downstream model 
parameters.

After I have written code for this class, what is the easiest way to get it 
reviewed/incorporated into scikit?

Thanks,
Mike Kneier



------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to