Classification + glossary usage

Loic Descotte Wed, 21 Sep 2011 08:25:46 -0700

Hi,

I'm currently working on a text classification problem.

As my learning datasets are rather small (dozens of entries), I'mlooking for a solution to use a glossary in addition to the learningphase, to tune the model.

I know that some text entries containing some keywords have great chanceto be in a specific category. With SGD, I've tried to create a newVector with this keywords and I added this to the current category'svector during the learning.


It works pretty good if I put a big weight on it (700).

My "hack" code looks like this :


      List<String> keywords = ... //keywords for the current category

      for (String keyword : keywords) {

predictorEncoders.get(99).addToVector(keyword, 700,featureVector);

      }

The 99 predictor is a new predictor I've created just for this keywords :

FeatureVectorEncoder keyWordEncoder =TYPE_DICTIONARY.get("text").getConstructor(String.class).newInstance("keywords");

predictorEncoders.put(99, keyWordEncoder);

It works pretty well, my confusion matrix is better with this hack, butmaybe it's not optimal because this attribute does not exists in thetrain/test data.

Did someone experienced this kind of things? Do you have advices? Or isit just a wrong idea?


Thanks!

Loic

Classification + glossary usage

Reply via email to