Hi, well, it really depends on what you want to do ;) TF-IDF is a measure that originates in the information retrieval context and that can be used to judge the relevancy of a document in context of a given search term.
It's also often used for text-related machine learning tasks. E.g. have a look at topic extraction using non-negative matrix factorization. Regards, Jeff 2015-03-09 7:39 GMT+01:00 Xi Shen <davidshe...@gmail.com>: > Hi, > > I read this page, > http://spark.apache.org/docs/1.2.0/mllib-feature-extraction.html. But I > am wondering, how to use this TF-IDF RDD? What is this TF-IDF vector looks > like? > > Can someone provide me some guide? > > > Thanks, > > > [image: --] > Xi Shen > [image: http://]about.me/davidshen > <http://about.me/davidshen?promo=email_sig> > <http://about.me/davidshen> >