Hi Vladan and welcom to sklearn :)
I think what you describe is some particular transductive setting in
which you have training labels for some classes, but not all.
Transductive means that you know before-hand which data you want to
predict on (i.e. you can use all your data, have labels on
some, and infer the labels on others)
I don't think there is anything in scikit-learn that is particularly
tailored to your situation.
Do you know hot many labels there are in advance?
The most simple solution that comes to my mind is just use a clustering
mechanism on the whole data,
than assign labels to clusters via the training labels you have - and if
a cluster doesn't have enough labeled points, declare it a new
label.
If you want do do it "right", I would write down a generative model that
says something about how classes come into existence and then do
inference in that.
For example, if each class is well modeled by a Gaussian, you could fit
a GMM to your data where you enforce that samples that share a label
belong to the same component.
Hope that helps at least a bit.
Cheers,
Andy
On 04/06/2013 04:46 PM, Vladan Divljak wrote:
Hello,
I watched both excellent tutorials from PyCon 2013 on YouTube and
although without strong background in statistics, encouraged by this
fast food and Andy's Machine Learning Cheat Sheet on screen, I thought
to try something out.
I have large set of signals with extracted spectral signatures for
each (it's not astronomy). I classify these signals manually as I
already tried in the past to detect some simple correlation between
the signatures and classification groups, but I didn't find anything
reliable. I could try some signal processing and heavy statistics, but
that's far from trivial and I'm not sure I have right potential to go
there.
My problem to get started is this - I don't have all target
classification groups upfront, so new signal may not belong to any of
already existing classification groups, but introduce new. This is
causing me trouble to find the route and get started. If what I said
is not intelligible, I'll try to describe it differently - imagine
digits example that comes with sklearn; now imagine that I can
classify only couple of digits (0,1,2,3,4) and train model on limited
set of already classified digits, and now when I probe other digit
(like 5,6,7,8,9), I want model to be able to distinguish each of those
in separate groups accordingly. Does this make sense? Is it possible
at all?
Thanks in advance
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general