Hi Nikhil
Do you somehow do topic-specific TF-IDF transformations? Could you provide
a small (pseudo) code snippet for what you're doing?
I may be wrong, but judging from what you wrote, it doesn't look like you
use scikit-learn's OneVsRestClassifier
<http://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html>.
It will do all the work of managing multiple classes for you. Also, check
out Pipeline
<http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>.
At the moment your pipeline looks simple (just one transformer), but you
may get interested in more complicated "preprocessing" in the future.
On Thu, Jul 2, 2015 at 9:07 PM, nmura...@masonlive.gmu.edu <
nmura...@masonlive.gmu.edu> wrote:
> Hello,
>
> I have a text classification problem where I have about 50 classes and
> have 50 binary classifiers (1 per topic). The training set used to train
> each topic classifier is different (some instances might overlap). Each
> instance consists of a text snippet which is
> transformed using tf-idf vectorizer. I am using LinearSVM for each of the
> classifiers..
> Now I am trying to develop a web-service over this classification
> architecture where, given a new snippet of text, the service returns the
> scores for each of the topics ( [p(Topic) , p(Not-Topic)] in each case.) .
> For the new snippet of text, as I understand it, I will have to do 50
> transformations of the text to the tf-idf vectorizer for each topic and
> then pass the corresponding tf-idf transformed vector into the
> corresponding topic-classifier. I am trying to somehow minimize the number
> of transformation operations wherein, instead of having to do the
> transformation 50 times, I want to somehow combine all the topic
> information and calculate Tf-Idf of the new text once and run it through
> each of the classifiers. Is this possible using Scikit Learn? Any
> particular type of vectorizer that address problems like this?
>
> Thanks,
> Nikhil
>
> ------------------------------------------------------------------------------
> Don't Limit Your Business. Reach for the Cloud.
> GigeNET's Cloud Solutions provide you with the tools and support that
> you need to offload your IT needs and focus on growing your business.
> Configured For All Businesses. Start Your Cloud Today.
> https://www.gigenetcloud.com/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general