Hi Vladan and welcom to sklearn :)
I think what you describe is some particular transductive setting in which you have training labels for some classes, but not all. Transductive means that you know before-hand which data you want to predict on (i.e. you can use all your data, have labels on
some, and infer the labels on others)
I don't think there is anything in scikit-learn that is particularly tailored to your situation.
Do you know hot many labels there are in advance?

The most simple solution that comes to my mind is just use a clustering mechanism on the whole data, than assign labels to clusters via the training labels you have - and if a cluster doesn't have enough labeled points, declare it a new
label.

If you want do do it "right", I would write down a generative model that says something about how classes come into existence and then do inference in that. For example, if each class is well modeled by a Gaussian, you could fit a GMM to your data where you enforce that samples that share a label
belong to the same component.

Hope that helps at least a bit.

Cheers,
Andy


On 04/06/2013 04:46 PM, Vladan Divljak wrote:
Hello,

I watched both excellent tutorials from PyCon 2013 on YouTube and although without strong background in statistics, encouraged by this fast food and Andy's Machine Learning Cheat Sheet on screen, I thought to try something out.

I have large set of signals with extracted spectral signatures for each (it's not astronomy). I classify these signals manually as I already tried in the past to detect some simple correlation between the signatures and classification groups, but I didn't find anything reliable. I could try some signal processing and heavy statistics, but that's far from trivial and I'm not sure I have right potential to go there.

My problem to get started is this - I don't have all target classification groups upfront, so new signal may not belong to any of already existing classification groups, but introduce new. This is causing me trouble to find the route and get started. If what I said is not intelligible, I'll try to describe it differently - imagine digits example that comes with sklearn; now imagine that I can classify only couple of digits (0,1,2,3,4) and train model on limited set of already classified digits, and now when I probe other digit (like 5,6,7,8,9), I want model to be able to distinguish each of those in separate groups accordingly. Does this make sense? Is it possible at all?


Thanks in advance


------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html


_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to