Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-08 Thread Amita Misra
Thanks for the pointers and papers. I'd definitely go through this approach and see if it can be applied to my problem. Thanks, Amita On Fri, Aug 5, 2016 at 4:40 PM, Albert Thomas wrote: > Hi, > > About your question on how to learn the parameters of anomaly detection

Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-05 Thread Albert Thomas
Hi, About your question on how to learn the parameters of anomaly detection algorithms using only the negative samples in your case, Nicolas and I worked on this aspect recently. If you are interested you can have look at: - Learning hyperparameters for unsupervised anomaly detection:

Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-05 Thread Sebastian Raschka
> But this might be the kind of problem where you seriously ask how hard it > would be to gather more data. Yeah, I agree, but this scenario is then typical in a sense of that it is an anomaly detection problem rather than a classification problem. I.e., you don’t have enough positive

Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-05 Thread Amita Misra
Thanks everyone for the suggestions. Actually we thought of gathering more data but the point is we do not have many speed bumps in our driving area. If we drive over the same speed bump again and again it may not add anything really novel to the data. I think a combination of oversampling and

Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-05 Thread Qingkai Kong
t; *Sent:* Friday, August 5, 2016 9:33 AM > > *To:* Scikit-learn user and developer mailing list > *Subject:* Re: [scikit-learn] Supervised anomaly detection in time series > > > > ⚠ EXT MSG: > > Just to add a few things to the discussion: > >1. For unbalanced prob

Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-05 Thread Dale T Smith
@python.org<mailto:macys@python.org>] On Behalf Of Nicolas Goix Sent: Thursday, August 4, 2016 9:13 PM To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] Supervised anomaly detection in time series ⚠ EXT MSG: There are different ways of aggregating estimators.

Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-05 Thread Pedro Pazzini
*From:* scikit-learn [mailto:scikit-learn-bounces+dale.t.smith= > macys@python.org] *On Behalf Of *Nicolas Goix > *Sent:* Thursday, August 4, 2016 9:13 PM > *To:* Scikit-learn user and developer mailing list > *Subject:* Re: [scikit-learn] Supervised anomaly detection in time series

Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-05 Thread Dale T Smith
-learn] Supervised anomaly detection in time series ⚠ EXT MSG: There are different ways of aggregating estimators. A possibility can be to take the majority vote, or averaging decision functions. On Aug 4, 2016 8:44 PM, "Amita Misra" <amis...@ucsc.edu<mailto:amis...@ucsc.edu>

Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-04 Thread Nicolas Goix
There are different ways of aggregating estimators. A possibility can be to take the majority vote, or averaging decision functions. On Aug 4, 2016 8:44 PM, "Amita Misra" wrote: > If I train multiple algorithms on different subsamples, then how do I get > the final classifier

Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-04 Thread Amita Misra
If I train multiple algorithms on different subsamples, then how do I get the final classifier that predicts unseen data? I have very few positive samples since it is speed bump detection and we have very few speed bumps in a drive. However, I think that unseen new data would be quite similar

Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-04 Thread Amita Misra
SubSample would remove a lot of information from the negative class. I have more than 500 samples of negative class and just 5 samples of positive class. Amita On Thu, Aug 4, 2016 at 4:43 PM, Nicolas Goix wrote: > Hi, > > Yes you can use your labeled data (you will need

Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-04 Thread Nicolas Goix
Hi, Yes you can use your labeled data (you will need to sub-sample your normal class to have similar proportion normal-abnormal) to learn your hyper-parameters through CV. You can also try to use supervised classification algorithms on `not too highly unbalanced' sub-samples. Nicolas On Thu,