Thanks Vlad and Lee,
I just found out that the following loop is not really listing the right
topics -
k = 0
for i in np.argmax(nmf.transform(tfidf), axis=1):
print("Topic = " , feature_names[i], " ", topic_weights[i])
print("Document = ", data[k])
k = k + 1
The complete code is here http://lpaste.net/131727 - could you kindly
review and give me feedback.
Regards,
Kashyap
On Thu, Apr 30, 2015 at 1:30 AM, Lee Zamparo <zamp...@gmail.com> wrote:
> As Vlad suggests, the number of topics is a hyper-parameter, and you can
> optimize the value using cross-validation. Though there are other
> hyper-parameter estimation methods in sklearn I think. There are also many
> other closely related projects which could wrap your NMF and report back
> the ideal number of topics (see on github: Spearmint, hyperopt).
>
> Hope this helps,
>
>
> - Lee
>
> On April 29, 2015 at 3:54:49 PM, Vlad Niculae (zephy...@gmail.com) wrote:
>
> Another thing I've seen people do is to threshold based on the
> difference between the scores of the best and second best topics.
> (Only take documents with a clear winning topic.) For estimating the
> number of topics, you can use cross-validation.
>
> Vlad
>
> On Wed, Apr 29, 2015 at 12:42 AM, Joel Nothman <joel.noth...@gmail.com>
> wrote:
> > mask with np.max(..., axis=1) > threshold
> >
> > On 29 April 2015 at 14:35, C K Kashyap <ckkash...@gmail.com> wrote:
>
> >> How can I include a document against it's highest ranking topic only if
> it
> >> crosses a threshold?
> >>
>
> ------------------------------------------------------------------------------
>
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general