> Have you seen the Mahout book?
Yes I've bought your (very good) book in early access preview. It helps me a
lot in my investigations.
> ?! If a feature is not found in the production data, then you should not give
> it to the model as a predictor during training. Otherwise, you have a form of
> target leak.
I think I did'nt explain myself very well, sorry.
I mean that here is no attribute named 'keyword' in my test or train data.
But all the keywords I put when I create my new vector appear in the other
attributes of my datas (body and title)
I've selected them because I know they will occur very often in body and title.
I was just worried about creating a "fake" attribute name (keyword), like this :
for (String keyword : keywords) {
predictorEncoders.get(99).addToVector(keyword, 700, featureVector);
(The 99 predictor is a new predictor I've created just for this keywords)
But it seems to work (with big weights), keywords seems to be found in other
attributes because when I do this my results are getting better in the
confusion matrix.
So is it ok to do like this or it is still a dirty hack?
Loic
>
>
>