The problem of correlation of features is clearly present in text, but it is 
not so clear what the effect will be. For naive bayes this has the effect of 
making the classifier over confident but it usually still works reasonably 
well.  For logistic regression without regularization it can cause the learning 
algorithm to fail (mahout'so logistic regression is regularized, btw). 

Empirical evidence dominates theory in this situation. 

Sent from my iPhone

> On Dec 8, 2013, at 9:14, Fernando Santos <[email protected]> 
> wrote:
> 
> Now just a theoretical doubt. In a text classification example, what would
> it mean to have features that are high correlated?  I mean, in this case
> our features are basically words, do you have an example of how these
> features can not be independant? This concept is not really clear in my
> mind...

Reply via email to