As Vlad suggests, the number of topics is a hyper-parameter, and you can 
optimize the value using cross-validation.  Though there are other 
hyper-parameter estimation methods in sklearn I think.  There are also many 
other closely related projects which could wrap your NMF and report back the 
ideal number of topics (see on github: Spearmint, hyperopt).

Hope this helps,


- Lee

On April 29, 2015 at 3:54:49 PM, Vlad Niculae (zephy...@gmail.com) wrote:

Another thing I've seen people do is to threshold based on the  
difference between the scores of the best and second best topics.  
(Only take documents with a clear winning topic.) For estimating the  
number of topics, you can use cross-validation.  

Vlad  

On Wed, Apr 29, 2015 at 12:42 AM, Joel Nothman <joel.noth...@gmail.com> wrote:  
> mask with np.max(..., axis=1) > threshold  
>  
> On 29 April 2015 at 14:35, C K Kashyap <ckkash...@gmail.com> wrote:  

>> How can I include a document against it's highest ranking topic only if it  
>> crosses a threshold?  
>>  

------------------------------------------------------------------------------  
One dashboard for servers and applications across Physical-Virtual-Cloud  
Widest out-of-the-box monitoring support with 50+ applications  
Performance metrics, stats and reports that give you Actionable Insights  
Deep dive visibility with transaction tracing using APM Insight.  
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y  
_______________________________________________  
Scikit-learn-general mailing list  
Scikit-learn-general@lists.sourceforge.net  
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general  
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to