Hello, I have a text classification problem where I have about 50 classes and have 50 binary classifiers (1 per topic). The training set used to train each topic classifier is different (some instances might overlap). Each instance consists of a text snippet which is transformed using tf-idf vectorizer. I am using LinearSVM for each of the classifiers.. Now I am trying to develop a web-service over this classification architecture where, given a new snippet of text, the service returns the scores for each of the topics ( [p(Topic) , p(Not-Topic)] in each case.) . For the new snippet of text, as I understand it, I will have to do 50 transformations of the text to the tf-idf vectorizer for each topic and then pass the corresponding tf-idf transformed vector into the corresponding topic-classifier. I am trying to somehow minimize the number of transformation operations wherein, instead of having to do the transformation 50 times, I want to somehow combine all the topic information and calculate Tf-Idf of the new text once and run it through each of the classifiers. Is this possible using Scikit Learn? Any particular type of vectorizer that address problems like this?
Thanks, Nikhil ------------------------------------------------------------------------------ Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general