Thanks for the pointers and papers. I'd definitely go through this approach
and see if it can be applied to my problem.

Thanks,
Amita

On Fri, Aug 5, 2016 at 4:40 PM, Albert Thomas <albertthoma...@gmail.com>
wrote:

> Hi,
>
> About your question on how to learn the parameters of anomaly detection
> algorithms using only the negative samples in your case, Nicolas and I
> worked on this aspect recently. If you are interested you can have look at:
>
> - Learning hyperparameters for unsupervised anomaly detection:
> https://drive.google.com/file/d/0B8Dg3PBX90KNUTg5NGNOVnFPX0hDN
> mJsSTcybzZMSHNPYkd3/view
> - How to evaluate the quality of unsupervised anomaly Detection
> algorithms?:
> https://drive.google.com/file/d/0B8Dg3PBX90KNenV3WjRkR09Bakx5Y
> lNyMF9BUXVNem1hb0NR/view
>
> Best,
> Albert
>
> On Fri, Aug 5, 2016 at 9:34 PM Sebastian Raschka <
> m...@sebastianraschka.com> wrote:
>
>> > But this might be the kind of problem where you seriously ask how hard
>> it would be to gather more data.
>>
>>
>> Yeah, I agree, but this scenario is then typical in a sense of that it is
>> an anomaly detection problem rather than a classification problem. I.e.,
>> you don’t have enough positive labels to fit the model and thus you need to
>> do unsupervised learning to learn from the negative class only.
>>
>> Sure, supervised learning could work well, but I would also explore
>> unsupervised learning here and see how that works for you; maybe one-class
>> SVM as suggested or EM algorithm based mixture models (
>> http://scikit-learn.org/stable/modules/mixture.html)
>>
>> Best,
>> Sebastian
>>
>> > On Aug 5, 2016, at 2:55 PM, Jared Gabor <jgabor.as...@gmail.com> wrote:
>> >
>> > Lots of great suggestions on how to model your problem.  But this might
>> be the kind of problem where you seriously ask how hard it would be to
>> gather more data.
>> >
>> > On Thu, Aug 4, 2016 at 2:17 PM, Amita Misra <amis...@ucsc.edu> wrote:
>> > Hi,
>> >
>> > I am currently exploring the problem of speed bump detection using
>> accelerometer time series data.
>> > I have extracted some features based on mean, std deviation etc  within
>> a time window.
>> >
>> > Since the dataset is highly skewed ( I have just 5  positive samples
>> for every > 300 samples)
>> > I was looking into
>> >
>> > One ClassSVM
>> > covariance.EllipticEnvelope
>> > sklearn.ensemble.IsolationForest
>> > but I am not sure how to use them.
>> >
>> > What I get from docs
>> >
>> > separate the positive examples and train using only negative examples
>> > clf.fit(X_train)
>> > and then
>> > predict the positive examples using
>> > clf.predict(X_test)
>> >
>> >
>> > I am not sure what is then the role of positive examples in my training
>> dataset or how can I use them to improve my classifier so that I can
>> predict better on new samples.
>> >
>> >
>> > Can we do something like Cross validation to learn the parameters as in
>> normal binary SVM classification
>> >
>> > Thanks,?
>> > Amita
>> >
>> > Amita Misra
>> > Graduate Student Researcher
>> > Natural Language and Dialogue Systems Lab
>> > Baskin School of Engineering
>> > University of California Santa Cruz
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Amita Misra
>> > Graduate Student Researcher
>> > Natural Language and Dialogue Systems Lab
>> > Baskin School of Engineering
>> > University of California Santa Cruz
>> >
>> >
>> > _______________________________________________
>> > scikit-learn mailing list
>> > scikit-learn@python.org
>> > https://mail.python.org/mailman/listinfo/scikit-learn
>> >
>> >
>> > _______________________________________________
>> > scikit-learn mailing list
>> > scikit-learn@python.org
>> > https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>


-- 
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to