Another thing I've seen people do is to threshold based on the difference between the scores of the best and second best topics. (Only take documents with a clear winning topic.) For estimating the number of topics, you can use cross-validation.
Vlad On Wed, Apr 29, 2015 at 12:42 AM, Joel Nothman <joel.noth...@gmail.com> wrote: > mask with np.max(..., axis=1) > threshold > > On 29 April 2015 at 14:35, C K Kashyap <ckkash...@gmail.com> wrote: >> How can I include a document against it's highest ranking topic only if it >> crosses a threshold? >> ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general