Re: [Scikit-learn-general] Noob trying to classify

Andreas Mueller Sat, 06 Apr 2013 08:00:22 -0700

Hi Vladan and welcom to sklearn :)

I think what you describe is some particular transductive setting inwhich you have training labels for some classes, but not all.Transductive means that you know before-hand which data you want topredict on (i.e. you can use all your data, have labels on

some, and infer the labels on others)

I don't think there is anything in scikit-learn that is particularlytailored to your situation.

Do you know hot many labels there are in advance?

The most simple solution that comes to my mind is just use a clusteringmechanism on the whole data,than assign labels to clusters via the training labels you have - and ifa cluster doesn't have enough labeled points, declare it a new

label.

If you want do do it "right", I would write down a generative model thatsays something about how classes come into existence and then doinference in that.For example, if each class is well modeled by a Gaussian, you could fita GMM to your data where you enforce that samples that share a label

belong to the same component.

Hope that helps at least a bit.

Cheers,
Andy


On 04/06/2013 04:46 PM, Vladan Divljak wrote:

Hello,
I watched both excellent tutorials from PyCon 2013 on YouTube andalthough without strong background in statistics, encouraged by thisfast food and Andy's Machine Learning Cheat Sheet on screen, I thoughtto try something out.
I have large set of signals with extracted spectral signatures foreach (it's not astronomy). I classify these signals manually as Ialready tried in the past to detect some simple correlation betweenthe signatures and classification groups, but I didn't find anythingreliable. I could try some signal processing and heavy statistics, butthat's far from trivial and I'm not sure I have right potential to gothere.
My problem to get started is this - I don't have all targetclassification groups upfront, so new signal may not belong to any ofalready existing classification groups, but introduce new. This iscausing me trouble to find the route and get started. If what I saidis not intelligible, I'll try to describe it differently - imaginedigits example that comes with sklearn; now imagine that I canclassify only couple of digits (0,1,2,3,4) and train model on limitedset of already classified digits, and now when I probe other digit(like 5,6,7,8,9), I want model to be able to distinguish each of thosein separate groups accordingly. Does this make sense? Is it possibleat all?
Thanks in advance


------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html


_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Noob trying to classify

Reply via email to