Hi, I want to know is it possible to customize the logic of TF_IDF in
Apache Spark?
In typical TF_IDF the TF is computed for each word regarding its documents.
For example, the TF of word "A" can be differentiated in documents D1 and
D2, but I want to see the TF as term frequency among whole documents (like
word count). I implemented it using Spark RDDs but I was wondering is it
possible to bring it to Spark TF-IDF so I can work with other Spark ML
tools such as normalizer and hashing.

Thanks.

Reply via email to