Hi all, I would like to add a "combiner" class which would work with pipeline to allow users to augment the output of scikit's text feature extraction process (or other feature extraction processes). For example, after apply CountVectorizer, it is sometime desirable to augment the resulting dataset with additional features. Unless I am missing something, this is not easily done if the count vectorization is being used in a pipeline, especially if CountVectorizer parameters such as min_df are being optimized along with downstream model parameters.
After I have written code for this class, what is the easiest way to get it reviewed/incorporated into scikit? Thanks, Mike Kneier ------------------------------------------------------------------------------ Flow-based real-time traffic analytics software. Cisco certified tool. Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer Customize your own dashboards, set traffic alerts and generate reports. Network behavioral analysis & security monitoring. All-in-one tool. http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general