It's unclear to me what exactly you want to do with the classification algorithm. Is your goal to take in a binary data matrix indicating the presence of certain k-mers and predict whether the the present k-mers indicate a susceptible or resistant genome? If so, then you need to convert your sequence into this binary matrix (or possibly count matrix if you think counts are more important) such that each row indicates a genome and each column corresponds to a k-mer. I don't think scikit-learn has any built-in tools for turning a string into a k-mer encoding (possible future PR?) so you'd have to do this manually. Let me know if that answered your question.
On Tue, Jun 13, 2017 at 12:36 PM, Daniel Harris <daphi...@umich.edu> wrote: > Hello, > > I hope this is the correct email address for questions regarding support. > I posted my question here on stack exchange: > https://bioinformatics.stackexchange.com/q/702/842 > > Thank you, > Daniel > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn