Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-04 Thread Nicolas Goix
There are different ways of aggregating estimators. A possibility can be to take the majority vote, or averaging decision functions. On Aug 4, 2016 8:44 PM, "Amita Misra" wrote: > If I train multiple algorithms on different subsamples, then how do I get > the final classifier

Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-04 Thread Amita Misra
If I train multiple algorithms on different subsamples, then how do I get the final classifier that predicts unseen data? I have very few positive samples since it is speed bump detection and we have very few speed bumps in a drive. However, I think that unseen new data would be quite similar

Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-04 Thread Amita Misra
SubSample would remove a lot of information from the negative class. I have more than 500 samples of negative class and just 5 samples of positive class. Amita On Thu, Aug 4, 2016 at 4:43 PM, Nicolas Goix wrote: > Hi, > > Yes you can use your labeled data (you will need

Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-04 Thread Nicolas Goix
Hi, Yes you can use your labeled data (you will need to sub-sample your normal class to have similar proportion normal-abnormal) to learn your hyper-parameters through CV. You can also try to use supervised classification algorithms on `not too highly unbalanced' sub-samples. Nicolas On Thu,

[scikit-learn] BM25 Pull Request

2016-08-04 Thread Basil Beirouti
Hi all, Just sending an email for visibility. I've made a pull request to add Bm25 capabilities to complement TFIDF in feature_extraction.text. All tests pass. Sincerely, Basil Beirouti ___ scikit-learn mailing list scikit-learn@python.org