Re: Logistic Regression: poor results on small data set

hakeem Fri, 08 Jul 2011 00:03:29 -0700

After further experimentation, I discovered that the vectorization of my
data was a major cause for the degradation in accuracy of the learned
weights. The majority of the features used the same encoder
(StaticWordValueEncoder) with 2 probes. One of the stronger features
collided with both of the weakest features -- thus, causing this strong
feature to assume ("learn") a weak weight.


I increased the vector size substantially and reduced the number of probes
to 1. With the collisions eliminated, I find much more reasonable results.

I suppose the lesson is: Vectorization of the data has substantial
computational performance benefits; however, a degradation in model accuracy
is a potential trade-off.  


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Logistic-Regression-poor-results-on-small-data-set-tp3149694p3150228.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Logistic Regression: poor results on small data set

Reply via email to