Hi, About your question on how to learn the parameters of anomaly detection algorithms using only the negative samples in your case, Nicolas and I worked on this aspect recently. If you are interested you can have look at:
- Learning hyperparameters for unsupervised anomaly detection: https://drive.google.com/file/d/0B8Dg3PBX90KNUTg5NGNOVnFPX0hDNmJsSTcybzZMSHNPYkd3/view - How to evaluate the quality of unsupervised anomaly Detection algorithms?: https://drive.google.com/file/d/0B8Dg3PBX90KNenV3WjRkR09Bakx5YlNyMF9BUXVNem1hb0NR/view Best, Albert On Fri, Aug 5, 2016 at 9:34 PM Sebastian Raschka <m...@sebastianraschka.com> wrote: > > But this might be the kind of problem where you seriously ask how hard > it would be to gather more data. > > > Yeah, I agree, but this scenario is then typical in a sense of that it is > an anomaly detection problem rather than a classification problem. I.e., > you don’t have enough positive labels to fit the model and thus you need to > do unsupervised learning to learn from the negative class only. > > Sure, supervised learning could work well, but I would also explore > unsupervised learning here and see how that works for you; maybe one-class > SVM as suggested or EM algorithm based mixture models ( > http://scikit-learn.org/stable/modules/mixture.html) > > Best, > Sebastian > > > On Aug 5, 2016, at 2:55 PM, Jared Gabor <jgabor.as...@gmail.com> wrote: > > > > Lots of great suggestions on how to model your problem. But this might > be the kind of problem where you seriously ask how hard it would be to > gather more data. > > > > On Thu, Aug 4, 2016 at 2:17 PM, Amita Misra <amis...@ucsc.edu> wrote: > > Hi, > > > > I am currently exploring the problem of speed bump detection using > accelerometer time series data. > > I have extracted some features based on mean, std deviation etc within > a time window. > > > > Since the dataset is highly skewed ( I have just 5 positive samples for > every > 300 samples) > > I was looking into > > > > One ClassSVM > > covariance.EllipticEnvelope > > sklearn.ensemble.IsolationForest > > but I am not sure how to use them. > > > > What I get from docs > > > > separate the positive examples and train using only negative examples > > clf.fit(X_train) > > and then > > predict the positive examples using > > clf.predict(X_test) > > > > > > I am not sure what is then the role of positive examples in my training > dataset or how can I use them to improve my classifier so that I can > predict better on new samples. > > > > > > Can we do something like Cross validation to learn the parameters as in > normal binary SVM classification > > > > Thanks,? > > Amita > > > > Amita Misra > > Graduate Student Researcher > > Natural Language and Dialogue Systems Lab > > Baskin School of Engineering > > University of California Santa Cruz > > > > > > > > > > > > -- > > Amita Misra > > Graduate Student Researcher > > Natural Language and Dialogue Systems Lab > > Baskin School of Engineering > > University of California Santa Cruz > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn