Re: [scikit-learn] Supervised anomaly detection in time series

Nicolas Goix Thu, 04 Aug 2016 18:14:57 -0700

There are different ways of aggregating estimators. A possibility can be to
take the majority vote, or averaging decision functions.


On Aug 4, 2016 8:44 PM, "Amita Misra" <amis...@ucsc.edu> wrote:

> If I train multiple algorithms on different subsamples, then how do I get
> the final classifier that predicts unseen data?
>
>
> I have very few positive samples since it is speed bump detection and we
> have very few speed bumps in a drive.
> However, I think that  unseen new data would be quite similar to what I
> have in training data hence if I can correctly learn a classifier for these
> 5, I hope it should work well for unseen speed bumps.
>
> Thanks,
> Amita
>
> On Thu, Aug 4, 2016 at 5:23 PM, Nicolas Goix <goix.nico...@gmail.com>
> wrote:
>
>> You can evaluate the accuracy of your hyper-parameters on a few samples.
>> Just don't use the accuracy as your performance measure.
>>
>> For supervised classification, training multiple algorithms on small
>> balanced subsamples usually works well, but 5 anomalies seems indeed to be
>> very little.
>>
>> Nicolas
>>
>> On Aug 4, 2016 7:51 PM, "Amita Misra" <amis...@ucsc.edu> wrote:
>>
>>> SubSample would remove a lot of information from the negative class.
>>> I have more than 500 samples of negative class and just 5 samples of
>>> positive class.
>>>
>>> Amita
>>>
>>> On Thu, Aug 4, 2016 at 4:43 PM, Nicolas Goix <goix.nico...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Yes you can use your labeled data (you will need to sub-sample your
>>>> normal class to have similar proportion normal-abnormal) to learn your
>>>> hyper-parameters through CV.
>>>>
>>>> You can also try to use supervised classification algorithms on `not
>>>> too highly unbalanced' sub-samples.
>>>>
>>>> Nicolas
>>>>
>>>> On Thu, Aug 4, 2016 at 5:17 PM, Amita Misra <amis...@ucsc.edu> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am currently exploring the problem of speed bump detection using
>>>>> accelerometer time series data.
>>>>> I have extracted some features based on mean, std deviation etc
>>>>> within a time window.
>>>>>
>>>>> Since the dataset is highly skewed ( I have just 5  positive samples
>>>>> for every > 300 samples)
>>>>> I was looking into
>>>>>
>>>>> One ClassSVM
>>>>> covariance.EllipticEnvelope
>>>>> sklearn.ensemble.IsolationForest
>>>>>
>>>>> but I am not sure how to use them.
>>>>>
>>>>> What I get from docs
>>>>> separate the positive examples and train using only negative examples
>>>>>
>>>>> clf.fit(X_train)
>>>>>
>>>>> and then
>>>>> predict the positive examples using
>>>>> clf.predict(X_test)
>>>>>
>>>>>
>>>>> I am not sure what is then the role of positive examples in my
>>>>> training dataset or how can I use them to improve my classifier so that I
>>>>> can predict better on new samples.
>>>>>
>>>>>
>>>>> Can we do something like Cross validation to learn the parameters as
>>>>> in normal binary SVM classification
>>>>>
>>>>> Thanks,?
>>>>> Amita
>>>>>
>>>>> Amita Misra
>>>>> Graduate Student Researcher
>>>>> Natural Language and Dialogue Systems Lab
>>>>> Baskin School of Engineering
>>>>> University of California Santa Cruz
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Amita Misra
>>>>> Graduate Student Researcher
>>>>> Natural Language and Dialogue Systems Lab
>>>>> Baskin School of Engineering
>>>>> University of California Santa Cruz
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn@python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn@python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>>
>>>
>>>
>>> --
>>> Amita Misra
>>> Graduate Student Researcher
>>> Natural Language and Dialogue Systems Lab
>>> Baskin School of Engineering
>>> University of California Santa Cruz
>>>
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn@python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
>
> --
> Amita Misra
> Graduate Student Researcher
> Natural Language and Dialogue Systems Lab
> Baskin School of Engineering
> University of California Santa Cruz
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Supervised anomaly detection in time series

Reply via email to