Clustering raw articles vs clustering (Stanford's) NER output

David Noel Mon, 12 May 2014 06:37:28 -0700

I've spent a few weeks tuning Mahout to cluster news articles and have
had decent results. Decent, but still not perfect. In trying to think
of ways to improve my results I had the idea of running Mahout on
output from Stanford's Named Entity Recognizer (NER) instead of the
articles themselves, and seeing how that compared. Has anyone tried
this? Did it generate more cohesive clusters?

Clustering raw articles vs clustering (Stanford's) NER output

Reply via email to