Thanks for the pointers and papers. I'd definitely go through this approach and see if it can be applied to my problem.
Thanks, Amita On Fri, Aug 5, 2016 at 4:40 PM, Albert Thomas <albertthoma...@gmail.com> wrote: > Hi, > > About your question on how to learn the parameters of anomaly detection > algorithms using only the negative samples in your case, Nicolas and I > worked on this aspect recently. If you are interested you can have look at: > > - Learning hyperparameters for unsupervised anomaly detection: > https://drive.google.com/file/d/0B8Dg3PBX90KNUTg5NGNOVnFPX0hDN > mJsSTcybzZMSHNPYkd3/view > - How to evaluate the quality of unsupervised anomaly Detection > algorithms?: > https://drive.google.com/file/d/0B8Dg3PBX90KNenV3WjRkR09Bakx5Y > lNyMF9BUXVNem1hb0NR/view > > Best, > Albert > > On Fri, Aug 5, 2016 at 9:34 PM Sebastian Raschka < > m...@sebastianraschka.com> wrote: > >> > But this might be the kind of problem where you seriously ask how hard >> it would be to gather more data. >> >> >> Yeah, I agree, but this scenario is then typical in a sense of that it is >> an anomaly detection problem rather than a classification problem. I.e., >> you don’t have enough positive labels to fit the model and thus you need to >> do unsupervised learning to learn from the negative class only. >> >> Sure, supervised learning could work well, but I would also explore >> unsupervised learning here and see how that works for you; maybe one-class >> SVM as suggested or EM algorithm based mixture models ( >> http://scikit-learn.org/stable/modules/mixture.html) >> >> Best, >> Sebastian >> >> > On Aug 5, 2016, at 2:55 PM, Jared Gabor <jgabor.as...@gmail.com> wrote: >> > >> > Lots of great suggestions on how to model your problem. But this might >> be the kind of problem where you seriously ask how hard it would be to >> gather more data. >> > >> > On Thu, Aug 4, 2016 at 2:17 PM, Amita Misra <amis...@ucsc.edu> wrote: >> > Hi, >> > >> > I am currently exploring the problem of speed bump detection using >> accelerometer time series data. >> > I have extracted some features based on mean, std deviation etc within >> a time window. >> > >> > Since the dataset is highly skewed ( I have just 5 positive samples >> for every > 300 samples) >> > I was looking into >> > >> > One ClassSVM >> > covariance.EllipticEnvelope >> > sklearn.ensemble.IsolationForest >> > but I am not sure how to use them. >> > >> > What I get from docs >> > >> > separate the positive examples and train using only negative examples >> > clf.fit(X_train) >> > and then >> > predict the positive examples using >> > clf.predict(X_test) >> > >> > >> > I am not sure what is then the role of positive examples in my training >> dataset or how can I use them to improve my classifier so that I can >> predict better on new samples. >> > >> > >> > Can we do something like Cross validation to learn the parameters as in >> normal binary SVM classification >> > >> > Thanks,? >> > Amita >> > >> > Amita Misra >> > Graduate Student Researcher >> > Natural Language and Dialogue Systems Lab >> > Baskin School of Engineering >> > University of California Santa Cruz >> > >> > >> > >> > >> > >> > -- >> > Amita Misra >> > Graduate Student Researcher >> > Natural Language and Dialogue Systems Lab >> > Baskin School of Engineering >> > University of California Santa Cruz >> > >> > >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn@python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > >> > >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn@python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -- Amita Misra Graduate Student Researcher Natural Language and Dialogue Systems Lab Baskin School of Engineering University of California Santa Cruz
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn