Thanks everyone for your suggested.

I will have a look at PipeGraph - which might be a suitable option for us as Guillaume suggested.

If it works out, I will share it

Thanks

David


On 02/28/2018 08:29 AM, scikit-learn-requ...@python.org wrote:
Send scikit-learn mailing list submissions to
        scikit-learn@python.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://mail.python.org/mailman/listinfo/scikit-learn
or, via email, send a message with subject or body 'help' to
        scikit-learn-requ...@python.org

You can reach the person managing the list at
        scikit-learn-ow...@python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of scikit-learn digest..."


Today's Topics:

    1. New Transformer (David Burns)
    2. Re: New Transformer (Guillaume Lema?tre)
    3. Re: New Transformer (Manuel Castej?n Limas)


----------------------------------------------------------------------

Message: 1
Date: Tue, 27 Feb 2018 12:02:27 -0500
From: David Burns <david.mo.bu...@gmail.com>
To: scikit-learn@python.org
Subject: [scikit-learn] New Transformer
Message-ID: <726f2e70-63eb-783f-b470-5ea45af93...@gmail.com>
Content-Type: text/plain; charset="utf-8"; Format="flowed"

First post on this mailing list.

I have been working with time series data for a project, and thought I
could contribute a new transformer to segment time series data using a
sliding window, with variable overlap. I have attached demonstration of
how this would fit in the existing framework. The only challenge for me
here is that the transformer needs to transform both the X and y
variable in order to perform the segmentation. I am not sure from the
documentation how to implement this in the framework.

Overlapping segments is a great way to boost performance for time series
classifiers, so this may be a worthwhile contribution for some in this
area of ML. Ultimately, model_selection.TimeSeries.Split would need to
be modified to support overlapping segments, or a new class created to
enable validation for this.

Please let me know if this would be a worthwhile contribution, and if so
how to go about transforming the target vector y in the framework /
pipeline?

Thanks!

David Burns



-------------- next part --------------
A non-text attachment was scrubbed...
Name: TimeSeriesSegment.py
Type: text/x-python
Size: 3336 bytes
Desc: not available
URL: 
<http://mail.python.org/pipermail/scikit-learn/attachments/20180227/143ced86/attachment-0001.py>

------------------------------

Message: 2
Date: Tue, 27 Feb 2018 19:42:52 +0100
From: Guillaume Lema?tre <g.lemaitr...@gmail.com>
To: Scikit-learn mailing list <scikit-learn@python.org>
Subject: Re: [scikit-learn] New Transformer
Message-ID:
        <cacdxx9gy91jwt+xjfgtnub_5wvmv279dgums6autzffsnfe...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Transforming y is a big deal :)
You can refer to
https://github.com/scikit-learn/enhancement_proposals/pull/2
and the associated issues/PR to see what is going on. This is probably an
additional use case to think about when designing estimator which will be
modifying y.

Regarding the pipeline, I assume that your strategy would be to resample at
fit
and do nothing at predict, isn't it?

NB: you could actually implement this sampling in a FunctionSampler of
imblearn:
http://contrib.scikit-learn.org/imbalanced-learn/dev/generated/imblearn.FunctionSampler.html#imblearn.FunctionSampler
and then use the imblearn pipeline which would apply the transform at fit
time but not
at predict.

On 27 February 2018 at 18:02, David Burns <david.mo.bu...@gmail.com> wrote:

First post on this mailing list.

I have been working with time series data for a project, and thought I
could contribute a new transformer to segment time series data using a
sliding window, with variable overlap. I have attached demonstration of how
this would fit in the existing framework. The only challenge for me here is
that the transformer needs to transform both the X and y variable in order
to perform the segmentation. I am not sure from the documentation how to
implement this in the framework.

Overlapping segments is a great way to boost performance for time series
classifiers, so this may be a worthwhile contribution for some in this area
of ML. Ultimately, model_selection.TimeSeries.Split would need to be
modified to support overlapping segments, or a new class created to enable
validation for this.

Please let me know if this would be a worthwhile contribution, and if so
how to go about transforming the target vector y in the framework /
pipeline?

Thanks!

David Burns




_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn




_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to