Is it possible to customize Spark TF-IDF implementation

Soheil Pourbafrani Fri, 02 Nov 2018 14:15:02 -0700

Hi, I want to know is it possible to customize the logic of TF_IDF in
Apache Spark?
In typical TF_IDF the TF is computed for each word regarding its documents.
For example, the TF of word "A" can be differentiated in documents D1 and
D2, but I want to see the TF as term frequency among whole documents (like
word count). I implemented it using Spark RDDs but I was wondering is it
possible to bring it to Spark TF-IDF so I can work with other Spark ML
tools such as normalizer and hashing.


Thanks.

Is it possible to customize Spark TF-IDF implementation

Reply via email to