After further experimentation, I discovered that the vectorization of my
data was a major cause for the degradation in accuracy of the learned
weights. The majority of the features used the same encoder
(StaticWordValueEncoder) with 2 probes. One of the stronger features
collided with both of the weakest features -- thus, causing this strong
feature to assume ("learn") a weak weight.

I increased the vector size substantially and reduced the number of probes
to 1. With the collisions eliminated, I find much more reasonable results.

I suppose the lesson is: Vectorization of the data has substantial
computational performance benefits; however, a degradation in model accuracy
is a potential trade-off.  


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Logistic-Regression-poor-results-on-small-data-set-tp3149694p3150228.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Reply via email to