There are different ways of aggregating estimators. A possibility can be to take the majority vote, or averaging decision functions.
On Aug 4, 2016 8:44 PM, "Amita Misra" <amis...@ucsc.edu> wrote: > If I train multiple algorithms on different subsamples, then how do I get > the final classifier that predicts unseen data? > > > I have very few positive samples since it is speed bump detection and we > have very few speed bumps in a drive. > However, I think that unseen new data would be quite similar to what I > have in training data hence if I can correctly learn a classifier for these > 5, I hope it should work well for unseen speed bumps. > > Thanks, > Amita > > On Thu, Aug 4, 2016 at 5:23 PM, Nicolas Goix <goix.nico...@gmail.com> > wrote: > >> You can evaluate the accuracy of your hyper-parameters on a few samples. >> Just don't use the accuracy as your performance measure. >> >> For supervised classification, training multiple algorithms on small >> balanced subsamples usually works well, but 5 anomalies seems indeed to be >> very little. >> >> Nicolas >> >> On Aug 4, 2016 7:51 PM, "Amita Misra" <amis...@ucsc.edu> wrote: >> >>> SubSample would remove a lot of information from the negative class. >>> I have more than 500 samples of negative class and just 5 samples of >>> positive class. >>> >>> Amita >>> >>> On Thu, Aug 4, 2016 at 4:43 PM, Nicolas Goix <goix.nico...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> Yes you can use your labeled data (you will need to sub-sample your >>>> normal class to have similar proportion normal-abnormal) to learn your >>>> hyper-parameters through CV. >>>> >>>> You can also try to use supervised classification algorithms on `not >>>> too highly unbalanced' sub-samples. >>>> >>>> Nicolas >>>> >>>> On Thu, Aug 4, 2016 at 5:17 PM, Amita Misra <amis...@ucsc.edu> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am currently exploring the problem of speed bump detection using >>>>> accelerometer time series data. >>>>> I have extracted some features based on mean, std deviation etc >>>>> within a time window. >>>>> >>>>> Since the dataset is highly skewed ( I have just 5 positive samples >>>>> for every > 300 samples) >>>>> I was looking into >>>>> >>>>> One ClassSVM >>>>> covariance.EllipticEnvelope >>>>> sklearn.ensemble.IsolationForest >>>>> >>>>> but I am not sure how to use them. >>>>> >>>>> What I get from docs >>>>> separate the positive examples and train using only negative examples >>>>> >>>>> clf.fit(X_train) >>>>> >>>>> and then >>>>> predict the positive examples using >>>>> clf.predict(X_test) >>>>> >>>>> >>>>> I am not sure what is then the role of positive examples in my >>>>> training dataset or how can I use them to improve my classifier so that I >>>>> can predict better on new samples. >>>>> >>>>> >>>>> Can we do something like Cross validation to learn the parameters as >>>>> in normal binary SVM classification >>>>> >>>>> Thanks,? >>>>> Amita >>>>> >>>>> Amita Misra >>>>> Graduate Student Researcher >>>>> Natural Language and Dialogue Systems Lab >>>>> Baskin School of Engineering >>>>> University of California Santa Cruz >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Amita Misra >>>>> Graduate Student Researcher >>>>> Natural Language and Dialogue Systems Lab >>>>> Baskin School of Engineering >>>>> University of California Santa Cruz >>>>> >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn@python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn@python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> >>> >>> -- >>> Amita Misra >>> Graduate Student Researcher >>> Natural Language and Dialogue Systems Lab >>> Baskin School of Engineering >>> University of California Santa Cruz >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > > -- > Amita Misra > Graduate Student Researcher > Natural Language and Dialogue Systems Lab > Baskin School of Engineering > University of California Santa Cruz > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn