You can evaluate the accuracy of your hyper-parameters on a few samples. Just don't use the accuracy as your performance measure.
For supervised classification, training multiple algorithms on small balanced subsamples usually works well, but 5 anomalies seems indeed to be very little. Nicolas On Aug 4, 2016 7:51 PM, "Amita Misra" <[email protected]> wrote: > SubSample would remove a lot of information from the negative class. > I have more than 500 samples of negative class and just 5 samples of > positive class. > > Amita > > On Thu, Aug 4, 2016 at 4:43 PM, Nicolas Goix <[email protected]> > wrote: > >> Hi, >> >> Yes you can use your labeled data (you will need to sub-sample your >> normal class to have similar proportion normal-abnormal) to learn your >> hyper-parameters through CV. >> >> You can also try to use supervised classification algorithms on `not too >> highly unbalanced' sub-samples. >> >> Nicolas >> >> On Thu, Aug 4, 2016 at 5:17 PM, Amita Misra <[email protected]> wrote: >> >>> Hi, >>> >>> I am currently exploring the problem of speed bump detection using >>> accelerometer time series data. >>> I have extracted some features based on mean, std deviation etc within >>> a time window. >>> >>> Since the dataset is highly skewed ( I have just 5 positive samples for >>> every > 300 samples) >>> I was looking into >>> >>> One ClassSVM >>> covariance.EllipticEnvelope >>> sklearn.ensemble.IsolationForest >>> >>> but I am not sure how to use them. >>> >>> What I get from docs >>> separate the positive examples and train using only negative examples >>> >>> clf.fit(X_train) >>> >>> and then >>> predict the positive examples using >>> clf.predict(X_test) >>> >>> >>> I am not sure what is then the role of positive examples in my training >>> dataset or how can I use them to improve my classifier so that I can >>> predict better on new samples. >>> >>> >>> Can we do something like Cross validation to learn the parameters as in >>> normal binary SVM classification >>> >>> Thanks,? >>> Amita >>> >>> Amita Misra >>> Graduate Student Researcher >>> Natural Language and Dialogue Systems Lab >>> Baskin School of Engineering >>> University of California Santa Cruz >>> >>> >>> >>> >>> >>> -- >>> Amita Misra >>> Graduate Student Researcher >>> Natural Language and Dialogue Systems Lab >>> Baskin School of Engineering >>> University of California Santa Cruz >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> [email protected] >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> _______________________________________________ >> scikit-learn mailing list >> [email protected] >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > > -- > Amita Misra > Graduate Student Researcher > Natural Language and Dialogue Systems Lab > Baskin School of Engineering > University of California Santa Cruz > > > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
