Dear David, We recently submitted PipeGraph as a sklearn contrib project. Even though it is an ongoing project and we are right now modifying the interface in order to make it more suitable and useful for the sklearn community, I believe that the problems that you explain can be addressed by PipeGraph. If you need the possibility of defining different/equal transformations for X and y you can do it by simply defining different steps for each path; if you need different paths for fit and predict it is also possible to define them in PipeGraph. Please have a look at the general examples and judge by yourself if it fits your needs:
https://mcasl.github.io/PipeGraph/auto_examples/plot_4_example_combination_of_classifiers.html#sphx-glr-auto-examples-plot-4-example-combination-of-classifiers-py You can play with it using pip, for example: pip install pipegraph The API can be considered far from stable and we are following the advice of the sklearn community to turn it into something as useful as possible, but it is my humble opinion that in situations like this PipeGraph can provide a suitable solution. Best Manolo Best regards 2018-02-27 19:42 GMT+01:00 Guillaume Lemaître <g.lemaitr...@gmail.com>: > Transforming y is a big deal :) > You can refer to https://github.com/scikit-learn/enhancement_proposals/ > pull/2 > and the associated issues/PR to see what is going on. This is probably an > additional use case to think about when designing estimator which will be > modifying y. > > Regarding the pipeline, I assume that your strategy would be to resample > at fit > and do nothing at predict, isn't it? > > NB: you could actually implement this sampling in a FunctionSampler of > imblearn: > http://contrib.scikit-learn.org/imbalanced-learn/dev/generated/imblearn. > FunctionSampler.html#imblearn.FunctionSampler > and then use the imblearn pipeline which would apply the transform at fit > time but not > at predict. > > On 27 February 2018 at 18:02, David Burns <david.mo.bu...@gmail.com> > wrote: > >> First post on this mailing list. >> >> I have been working with time series data for a project, and thought I >> could contribute a new transformer to segment time series data using a >> sliding window, with variable overlap. I have attached demonstration of how >> this would fit in the existing framework. The only challenge for me here is >> that the transformer needs to transform both the X and y variable in order >> to perform the segmentation. I am not sure from the documentation how to >> implement this in the framework. >> >> Overlapping segments is a great way to boost performance for time series >> classifiers, so this may be a worthwhile contribution for some in this area >> of ML. Ultimately, model_selection.TimeSeries.Split would need to be >> modified to support overlapping segments, or a new class created to enable >> validation for this. >> >> Please let me know if this would be a worthwhile contribution, and if so >> how to go about transforming the target vector y in the framework / >> pipeline? >> >> Thanks! >> >> David Burns >> >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > > -- > Guillaume Lemaitre > INRIA Saclay - Parietal team > Center for Data Science Paris-Saclay > https://glemaitre.github.io/ > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn