Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-05 Thread Albert Thomas
Hi, About your question on how to learn the parameters of anomaly detection algorithms using only the negative samples in your case, Nicolas and I worked on this aspect recently. If you are interested you can have look at: - Learning hyperparameters for unsupervised anomaly detection:

Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-05 Thread Sebastian Raschka
> But this might be the kind of problem where you seriously ask how hard it > would be to gather more data. Yeah, I agree, but this scenario is then typical in a sense of that it is an anomaly detection problem rather than a classification problem. I.e., you don’t have enough positive

Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-05 Thread Amita Misra
Thanks everyone for the suggestions. Actually we thought of gathering more data but the point is we do not have many speed bumps in our driving area. If we drive over the same speed bump again and again it may not add anything really novel to the data. I think a combination of oversampling and

Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-05 Thread Qingkai Kong
I also worked on something similar, instead of using some algorithms deal with unbalanced data, you can also try to create a balanced dataset either using oversampling or downsampling. scikit-learn-contrib has already had a project dealing with unbalanced data:

Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-05 Thread Dale T Smith
To analyze unbalanced classifiers, use from sklearn.metrics import classification_report __ Dale Smith | Macy's Systems and Technology | IFS eCommerce | Data Science and Capacity Planning | 5985 State

Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-05 Thread Pedro Pazzini
Just to add a few things to the discussion: 1. For unbalanced problems, as far as I know, one of the best scores to evaluate a classifier is the Area Under the ROC curve: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html. For that you will have to

Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-05 Thread Dale T Smith
I don’t think you should treat this as an outlier detection problem. Why not try it as a classification problem? The dataset is highly unbalanced. Try http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html Use sample_weight to tell the fit method about the