After further experimentation, I discovered that the vectorization of my
data was a major cause for the degradation in accuracy of the learned
weights. The majority of the features used the same encoder
(StaticWordValueEncoder) with 2 probes. One of the stronger features
collided with both of the weakest features -- thus, causing this strong
feature to assume ("learn") a weak weight.I increased the vector size substantially and reduced the number of probes to 1. With the collisions eliminated, I find much more reasonable results. I suppose the lesson is: Vectorization of the data has substantial computational performance benefits; however, a degradation in model accuracy is a potential trade-off. -- View this message in context: http://lucene.472066.n3.nabble.com/Logistic-Regression-poor-results-on-small-data-set-tp3149694p3150228.html Sent from the Mahout User List mailing list archive at Nabble.com.
