Hello,

I have a text classification problem where I have about 50 classes and have 50 
binary classifiers (1 per topic).  The training set used to train each topic 
classifier is different (some instances might overlap). Each instance consists 
of a text snippet which is 
transformed using tf-idf vectorizer.  I am using LinearSVM for each of the 
classifiers..
Now I am trying to develop a web-service over this classification architecture 
where, given a new snippet of text, the service returns the scores for each of 
the topics ( [p(Topic) , p(Not-Topic)] in each case.) . For the new snippet of 
text, as I understand it, I will have to do 50 transformations of the text to 
the tf-idf vectorizer for each topic and then pass the corresponding tf-idf 
transformed vector into the corresponding topic-classifier. I am trying to 
somehow minimize the number of transformation operations wherein, instead of 
having to do the transformation 50 times, I want to somehow combine all the 
topic information and calculate Tf-Idf of the new text once and run it through 
each of the classifiers. Is this possible using Scikit Learn? Any particular 
type of vectorizer that address problems like this?

Thanks,
Nikhil
------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to