If I train multiple algorithms on different subsamples, then how do I get the final classifier that predicts unseen data?
I have very few positive samples since it is speed bump detection and we have very few speed bumps in a drive. However, I think that unseen new data would be quite similar to what I have in training data hence if I can correctly learn a classifier for these 5, I hope it should work well for unseen speed bumps. Thanks, Amita On Thu, Aug 4, 2016 at 5:23 PM, Nicolas Goix <goix.nico...@gmail.com> wrote: > You can evaluate the accuracy of your hyper-parameters on a few samples. > Just don't use the accuracy as your performance measure. > > For supervised classification, training multiple algorithms on small > balanced subsamples usually works well, but 5 anomalies seems indeed to be > very little. > > Nicolas > > On Aug 4, 2016 7:51 PM, "Amita Misra" <amis...@ucsc.edu> wrote: > >> SubSample would remove a lot of information from the negative class. >> I have more than 500 samples of negative class and just 5 samples of >> positive class. >> >> Amita >> >> On Thu, Aug 4, 2016 at 4:43 PM, Nicolas Goix <goix.nico...@gmail.com> >> wrote: >> >>> Hi, >>> >>> Yes you can use your labeled data (you will need to sub-sample your >>> normal class to have similar proportion normal-abnormal) to learn your >>> hyper-parameters through CV. >>> >>> You can also try to use supervised classification algorithms on `not too >>> highly unbalanced' sub-samples. >>> >>> Nicolas >>> >>> On Thu, Aug 4, 2016 at 5:17 PM, Amita Misra <amis...@ucsc.edu> wrote: >>> >>>> Hi, >>>> >>>> I am currently exploring the problem of speed bump detection using >>>> accelerometer time series data. >>>> I have extracted some features based on mean, std deviation etc within >>>> a time window. >>>> >>>> Since the dataset is highly skewed ( I have just 5 positive samples >>>> for every > 300 samples) >>>> I was looking into >>>> >>>> One ClassSVM >>>> covariance.EllipticEnvelope >>>> sklearn.ensemble.IsolationForest >>>> >>>> but I am not sure how to use them. >>>> >>>> What I get from docs >>>> separate the positive examples and train using only negative examples >>>> >>>> clf.fit(X_train) >>>> >>>> and then >>>> predict the positive examples using >>>> clf.predict(X_test) >>>> >>>> >>>> I am not sure what is then the role of positive examples in my training >>>> dataset or how can I use them to improve my classifier so that I can >>>> predict better on new samples. >>>> >>>> >>>> Can we do something like Cross validation to learn the parameters as in >>>> normal binary SVM classification >>>> >>>> Thanks,? >>>> Amita >>>> >>>> Amita Misra >>>> Graduate Student Researcher >>>> Natural Language and Dialogue Systems Lab >>>> Baskin School of Engineering >>>> University of California Santa Cruz >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Amita Misra >>>> Graduate Student Researcher >>>> Natural Language and Dialogue Systems Lab >>>> Baskin School of Engineering >>>> University of California Santa Cruz >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn@python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> >> -- >> Amita Misra >> Graduate Student Researcher >> Natural Language and Dialogue Systems Lab >> Baskin School of Engineering >> University of California Santa Cruz >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -- Amita Misra Graduate Student Researcher Natural Language and Dialogue Systems Lab Baskin School of Engineering University of California Santa Cruz
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn